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Note on Jordan elimination, linear programming 
and Tchebycheff approximation 
By 
E. STIEFEL 


The Tchebycheff problem of solving an inconsistent system of linear equations 
1) = 2a Bin Xa + j= 0, j=1,2,...,% (1) 


in such a way that the “solution” x, minimizes 
¢=Max|y,|, {=1,2,...,% (2) 


has recently been discussed in many interesting papers. E. REMEz has written 
a book [J] on the subject generalizing his classical methods of Tchebycheff 
approximation of functions by polynomials, A. A. GOLDSTEIN and W. CHENEY 
[2, 3] worked out several algorithms and discussed in particular the relations 
to linear programming. 

There are two essentially different methods of attacking the problem stated 
above. The “minimizing methods” start from a trial solution x, and use an 
algorithm for diminishing ¢ during every step of an appropriate computational 
routine. S. I. ZUHOVICKII published such an algorithm requiring only a finite 
number of steps in order to find the solution of our problem [4, 5]. 

But there is another line of attack based on theorems of DE LA VALLEE-Pous- 
SIN [6]. He uses a “‘reference’’, that is by definition a set of (m+ 1) equations y, 
chosen among the given equations (1). The solution of the following reduced 
Tchebycheff problem 

Minimize ¢* = Max|y,| (3) 


can be computed explicitly and it can be shown that ¢* is a lower bound of the 
minimal deviation ¢ corresponding to the solution of the unrestricted problem 
(1), (2). Therefore “maximizing methods” are available. Their strategy consists 
of replacing the reference by another one so that the reference-deviation ¢* is 
raised. The author published such an algorithm, called exchange-method [7] 
characterized also by the property of reaching the final solution in a finite number 
of steps. Furthermore in his report [8] the author advocated the construction 
of ‘‘minimax-methods” furnishing a lower and an upper bound for the wanted 
minimum (2) and closing the gap between those bounds during the computation. 
The construction of such an algorithm by E. W. CHENEY and A. A. GOLDSTEIN [9] 
seems to be the progress in this field of research most recently communicated. 
In the sequel it will be shown that many of the algorithms for solving the 
Tchebycheff problem are closely related to methods of linear programming. The 
main result can be stated as follows. ZUHOVICKII’s algorithm and the exchange 
Numer, Math. Bd.2 1 
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method as well are completely equivalent to the well-known simplex algorithm of 
G. B. Dantzic. Moreover these two algorithms are dual programs in the sense of 
the terminology of linear programming. 

Hence, from a theoretical point of view, we may forget about methods for 
solving the Tchebycheff problem and use for this purpose the computational 
programs prepared already for linear programming. However from the practical 
point of view the situation is a little different. 

It will be shown indeed that for instance the exchange method — properly 
adapted to the computational routines of linear programming — is more econo- 
mical than the simplex method. 

It seems to be unavoidable to give in this paper a short account of some of 
the basic principles of linear programming. The reader may also consult the 
books [10], [11]. The presentation of the simplex method is modeled on a talk 
given by Professor A. W. TUCKER (Princeton University) in May 1959, and 
Tucker’s formulation made it clear to me that the simplex method and the 
exchange method are equivalent. Then, in June 1959, this equivalence was 
also briefly mentioned to me by Professor F. L. BAUER as part of a forthcoming 
Mainz Thesis, based on his investigations of polyhedral norms. The author is 
indebted to Professor BAUER for this stimulans and to Dr. D. MORGENSTERN 
(Technische Universitat Berlin) for comments and suggestions. 


I. Jordan elimination 


Let 
= . 
Yj= Lain Xe» 7=1,2,...,m (4) 
be » given linear homogeneous function of the independent variables x,, %2, ..., Xm: 


This relation between independent variables x, and dependent variables y; can 
be described by the following table: 





xy Xo x; Xm 
MW=| 41 Ge --s Gs +++ Gm 
Vo> Aoi Ase eee Ags cee aom 
m . (5) 
Ws 41 Aya ++ oes Ayy 
yo™ Gny Ang +++ Ans +++ Anm 











The kernel of the table is the matrix A of the given constants. After chosing 
a pair 7, s of subscripts, we want now to introduce y, as an independent and x, 
as a dependent variable. For this purpose the equation 


Vo = Ayy %y + yg X%q t+ + yg Xe sts + Aya Xm (6) 


is solved with respect to x, and the result is inserted into the remaining linear 
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forms. The variable y, will appear in the headline and the variable x, on the 
left side of the new table 

















x Hg s+ Vy aS Xm 
a? a ee ee ore bs m 
+ eae be, dae. . 425 ana bom pore 
“= i Sp (1) toe ~ ee by a,,. (7) 
Yn = bin a. a 

















The new elements 6;, are the second order determinants 
Dik = jn Ors — 4; App: (8) 


The element a,, is called the pivot and we have the following set of rules: 

— 1 — The pivot is replaced by 1. 

— 2 — The other elements of the pivotal column remain unchanged. 

— 3 — The signs of the other elements in the pivotal row have to be changed. 

— 4—A “normal” element b;, of the new table is the second order determinant 
built by the old element a;, and the pivot. 

— 5 — Afterwards every element of the new table must be divided by the pivotal 

element. 

The whole operation is called a step of Jordan elimination. It is easy to 
prove the following theorem of STEINITZ: 

Assume that the given linear forms (4) are linearly independent. (This 
involves nm.) Then it is possible to introduce all the variables y,, yo, ..., V, 
as independent variables by carrying out exactly appropriate Jordan steps. 

As it is well known, the theorem of STEINITZ can be taken as a basic tool 
for the development of the theoretical linear algebra. (The reader may find 
valuable material on this subject in the book [/2]). This is the reason for the 
fact that Jordan elimination is the basic tool for numerical procedures of linear 
algebra. Jordan elimination is able to solve almost any linear problem including 
linear programming and linear Tchebycheff approximation. Only the choice of 
the pivot must be adapted to the special problem at hand. 

If for instance m =n and the matrix A is nonsingular, the variables y,, Vo, ...,V, 
can be introduced as independent variables and %,, x2, ..., x, aS functions. The 
kernel of the final table is then the inverse matrix of A. 

For solving a system of linear equations the Jordan algorithm is slightly 
modified in the following way. 


Let be 
Wy= Leyam+y=0, f= 1,2,...8 (9) 


the given system. The column of the constants c; is added to the table and treated 

during the elimination as any other column. (It will of course never appear as 

pivotal column.) After an elimination step producing a new table (7) the column 
4* 
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y, is cancelled because y, must vanish as is required by the given equations (9). 
The result of the Jordan steps is then a single column containing the values 
of the unknowns. 

Sometimes computer prefer to cancel also the pivotal row x, of the new 
table and to record it separately in order to use it later on. This procedure is 
calied Gaussian elimination. At the end of this algorithm only one of the un- 
knowns is obtained, the other ones are found by inserting backwards using the 
recorded pivotal rows. 

Duality. In order to establish duality it is convenient to take as given linear 


forms 
ad . 
¥j= Dajn(—%), 7=1,2,...,m. (10) 
k=1 


The tables corresponding to (5) (7) are then 


(= X) (— %g) 1 +> (— %,) We ie (— Xm) 








a1 Ayo soe As aim 


VW 














Vr=| G1 Gp +s. 4, m (411) 















































74 ani ane ans anm 
and 
(— %) (— %2) (— ») (— %p) 
MW=}| by bie —Qs Dim 
divided 
ol Ge Be ov %3 (1) —— by 4,,. (12) 
—— bay Dns — ays One 




















Again we have 
Dip = A;,4, 5 — A; .Ayy. 

Obviously only the rules — 2 — and — 3 — stated above are permuted. 
The pivotal row remains now unchanged and the pivotal column gets opposite 
signs. The rules — 1 —, — 4 —, — 5 — are still valid. For convenience we call 
this algorithm modified Jordan elimination. 

Let us now introduce the linear forms 


Me = Laine, k=1,2,...,m (13) 
j= . 
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having as matrix the transposed matrix of A. If we record in the corresponding 
table the independent variables u; on the left side and the dependent variables v, 
in the headline, we get the following picture. 





y= Vg >= ee oO v= ae a Vy, = 
Uy ay Ae ree” Ys —* am 
aS Gi ote css Bye dso Bae # (14) 
u,, ani ane Hag ans A anm 











This table coincides with the table (11). By a Jordan elimination step with 
pivot a,, (introducing v, as independent and u, as dependent variable) exactly 
the table (12) is obtained as follows from the rules — 1 — to — 5 —. Therefore 
a step of modified Jordan elimination applied to a given table (11) is equivalent 
to an ordinary Jordan step applied to the dual table (14). 

This result can also be used for proving theorems of linear algebra. For 
instance the fact that the rank of the rows of a matrix is equal to the rank of 
its columns is an easy consequence of this duality. 


II. Simplex method 


The basic problem of linear programming is to maximize a linear form 


m 
ad = 2s Xp (15) 
subject to linear inequalities 
Vj =D ajyn%t+e,20, 2 or (16) 
k=1 


as constraints. In the Euclidian m-dimensional space with coordinates x, the 
inequalities (16) generate a convex polyhedron bounded by the hyperplanes y;=0. 
The problem is to find a point in this convex set with maximal z. We may record 
that z is proportional to the distance of the point to the hyperplane 


Shm=0 (17) 


therefore the solution is in general a vertex of the polyhedron. The simplex 
method is a simple and straightforward algorithm for solving this problem 
algebraically; it is most easily explained dealing with an example. Let us take 
m= 2 and discuss the problem to maximize the linear form 


z=— 3x%,+ 6%, (18) 
under the »=5 constraints 
Vy=%+2%,+120, Vo=2%+%—420, Vg=%y—%+120 
Ye=m%—4%4+1320, ys=—4y+%4+ B20. (19) 
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The convex polygon and the line z=0 corresponding to this situation are ploted 
in figure 1. We shall use modified Jordan elimination, therefore the Jordan 
table of the 6 given linear functions is 








—% —& 1 
w%.=| —-1 -—2 1 
Ye=| —2 —-1 —4 
Ys=| —1 1 1 
w=|—-1 4] 8 29) 
%u= . 23 
sm 3 —6 0 














The first step of the computation is to eliminate the coordinates x,, x,. This is 
performed by two modified Jordan steps introducing two of the constraints 
























































6 - — for instance y,, ye — as independent 
Ps ted variables instead of x,, x,. The result is 
- , Uo ws 
Mm Z cal ce 1 
band fs a Ya = 1) “7 ° 
=a 
4 YA = —2 24 
{2}-8 j--2n0 | ‘i / . (21) 
hy a. Ys 
“9 Z z=| —5 4 — 21 
0 
l2 
4 The lines x,, ¥, have been cancelled; they 
«ll g , are 
A 2 
%=— gnutsyvet3, 
L | l (22) 
0 2 st al 6 é X= $%1— 32-2 
1 
Fig. 1. Simplex algorithm and will be used later on after finishing 


the algorithm. If the variables in the 
headline vanish, we are at the point A of the figure, intersection of the lines 
y,=0 and y,=0. At this point we have 


V3 = 6, Va = 24, ¥3=9 (23) 
z=— 21 
as follows from the last column of the table. Because all the constraints y,, ..., V5 


are =O the point A is contained in the convex set and as intersection of two 
boundary lines it is moreover a vertex of the polygon. 

It was of course sheer luck that the elements of the last column in the first 
three rows became positive. In general this is not true and then the problem 
arises to find a vertex of the polygon (a “feasible solution’’). This problem will 











Jordan elimination, linear programming and Tchebycheff approximation 7 


not be discussed in this paper, because it is immaterial for our purposes. We 
mention only that also this problem can be attacked by the simplex method. 
The strategy of the simplex method is now to proceed from the vertex A 
to another vertex along a segment of the boundary and increasing z on this 
path. (In the general m-dimensional case the path is an edge of the polyhedron.) 
Such a path is characterized by the property that one of the variables in the 
headline varies linearly, the other variables in the headline remaining =0. 
For instance 
w=0, Y=A. (24) 


The parameter A must be positive as required by the constraint y,20. Along 
this path z has the value 
z=—4A-—21<— 21 


according to the table. Hence z is diminished. But we want z to increase, there- 
fore we must select for the varying y a column containing a negative element 
in the last row. 


In our case only y, satisfies this condition: 


wM=A, ¥2=0 (25) 
Thus 
Vg=—A+6, W=—3A4+24, Ys=2A4+9 (26) 


z=5A—21 


and the constraints require that ys, y,4, ¥,; remain =O, hence 


As6={, AS#=8, AS—F. (27) 
The last inequality is satisfied automatically because A20. Therefore 
AS6 


and we choose A=6. This number is the smallest among the positive quotients 
of the elements of the last column divided by the elements of the y,-column. 
This choice yields y,=0 and our path stops at the vertex B characterized by 
Yg=0, yg=0. Therefore a modified Jordan step is carried out introducing y, 
in the headline instead of y, (pivot y,, y;). The new table is 








— Ys =—— 3 

y= 1 —1 6 

Y=| —3 (1) 6 
(28) 

z= 5 — § 9 














and z has been raised from (— 21) to 9. 


Let us resume: A step of the simplex method is a modified Jordan step 
with the following rules for the choice of the pivot: 
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1. The column of the pivot must contain a negative element in the last row 
(z-row). 

2. Build the quotients of the elements of the last column above the z-line 
divided by the corresponding elements of the pivotal column. The smallest 
among the positive quotients indicates the position of the pivotal row. 


According to these rules the next step is (pivot 2, V4) 








i “ie 1 
Ye=| —3 1 6 
‘ (29) 
> ioe ie 2 
:= 2 1 15 














producing the vertex C of the polygon. There is no longer a negative element 
in the last row, thus the algorithm comes to an end. From this last row it follows 


Z=— 2¥3— Yat 15 


V3, Yq being =O this quantity is maximal if y,=0, yz=0 . Thus C is the solution 
of the problem. From (29) it follows 


yy, = 12, Yo= 6, Ys= 15 








and taking into account (22) 
xy 7 3 , Xe isa 4 . 














Remark. At the start of the simplex algorithm the elements of the last 
column (with exception of the z-element) must be positive; they remain positive 
during the algorithm because the constraints are satisfied at the successive 
vertices of the polyhedron. 


At the end of the algorithm also the elements of the last row become positive 
because otherwise the algorithm could be carried on. During the algorithm the 
element at the lower right-hand corner of the table is a lower bound of the final 
maximum. 


Duality. Let us introduce the dual table of the start table (21) of our simplex 
algorithm 








%.= = w= 
Vg i<—1 6 
U4 3 —2 24 30) 
Us| —2 3 9 
oi -=§ 4 — 21 
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corresponding to the linear functions 
V, = V3 + 3g — 20,—5, Up = — Vg — 2U4+ 30,+ 4 
w = 6v, + 244+ 9v; — 21 
of the three independent variables v,, v4, v;. As has been pointed out in section I, 
each modified Jordan step (28), (29) carried out during the simplex algorithm 


is also an ordinary Jordan step with respect to the dual table (30). Therefore 
we are faced at the end of the simplex algorithm to the table given by (29) 








V3 = u4= w= 
| —2 1 | 12 
Us| —3 1 6 
: (31) 
Us 5 —1 | 15 
1 y 1 15 














The original problem of linear programming was to maximize z under the con- 
straints y,=0 (j=1, 2,...,5) as stated in (19). By definition the dual problem 
is to minimize w under the constraints v,20 (k=1, 2,...,5). Now it follows 
from the final table (31) 


w= 12v,+ 6v,+ 15, + 15 (32) 


At the end of any simplex algorithm the coefficients of the variables v in 
this equation are positive numbers because they are the elements of the last 
column. Furthermore these variables are required to be =O; therefore w will 
be minimal if 

Vy = Wg = U5= 0. (33) 
From (31) it follows then 
~— 2, —= 1 (34) 


and these quantities are always =O because they are the elements of the last 
row of the final table of the simplex algorithm. 

Therefore the constraints are satisfied and the solution of the dual problem 
is obtained. The minimum is w=15. During the algorithm the element at the 
lower right-hand corner of the tables is a lower bound of this minimum because 
this quantity increases monotonically from one table to the next. 

Thus we have the following interesting result. The simplex algorithm solves 
the given problem and its dual problem simultaneously. The maximum of the 
original problem is equal to the minimum of the dual problem. 

Finally we want to emphasize the following fact. The solution of the dual 
problem has been found by annihilating the variables on the left side of the 
final table. If we try to do the same with an intermediate table, no admissible 
set of variables v,, v2,..., v5; is obtained because the elements of the last row 
are not yet positive and thus the constraints are not satisfied. They are only 
satisfied after the last step of the simplex method. 








2 








10 E. STIEFEL: 


III. The Tchebycheff problem 


Let be 
N= DL ajrXet+G, 4=1,2,...,%, (35) 
k=1 
n given linear functions of the m independent variables x,, x2, ..., % and let be 
¢ = Max|y,|, 4=1,2,...,%. (36) 


The problem is to minimize ¢ by appropriate choice of %,, %2,..-, %m- This 
problem will be reduced to a problem of linear programming by the following 
definitions. It will be helpful to intro- 
duce one more variable x,,,, and the 
2m linear functions 


Vj = 5 + X mai 


m 
=), Asp Xpt Xmoit o; 
k=1 





* 
Vi = — 1X mss 
m 
xy x tae weit Fi Xpt Xmta— & 


Fig. 2. Adjoint linear program j ss 1, 2, ...M, (37) 
and finally the linear form z=%,,,, (having m coefficients =0). The adjoint 
linear programming problem of the Tchebycheff problem is then to minimize z 
by appropriate choice of the (m+ 1) variables x, subject to the constraints 


It is not too hard to prove that the values of x,, %2, ..., %,, corresponding to 
this minimum are the solutions of the Tchebycheff problem. 


Proof. Let z be the minimum of the linear program. The constraints may be 
written 
z2n;, *2-—N;, J=1,2,...," (38) 


taking into account (37). If now %,, x2, ..., % have fixed values and only ~x,,,, 
is allowed to vary, the quantities 7; in (38) are also fixed and the minimum of 
Z= X41 1s the larger of the two numbers 


Maxn;, Max(—7,), 7=1,2,...,” (39) 


this is to say equal to Max |y;|=¢. This restricted minimum ¢ of z is of course 
not smaller than the minimum 2, corresponding to the free variation of all the 


variables %,,..., %4+41; hence €=z». This finishes the proof and z, is the wanted 
minimum of ¢. 


Figure 2 should facilitate the understanding of the proof. We took m=—1, 
thus 


; = 451 xy -f Cj. 
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The convex polygon generated by the constraints is bounded by the lines 
Xe = 4% +C;, %q = — (A;1%+6;). 


For a given subscript 7 the two lines are symmetric with respect to the x-axis. 
The solution of the linear program and of the Tchebycheff problem is given by 
the lowest point of the polygon. 

In the general case the polyhedron of the linear program is located in the 
(m-+-1)-dimensional space spanned by the coordinates x, %2,..., %4,- In order 
to apply the simplex method as described in section II we replace the problem 
of minimizing z by the problem of maximizing (— z). The simplex method starts 
at a vertex of the polyhedron and proceeds downstairs to the lowest point along 
a path composed of 
edges of the polyhedron. 
We want to give also a 
geometrical description 
of this procedure re- 
stricting ourselves to the 
space a of the Tche- 
bycheff problem spanned 
by the coordinates %,, 
Ras 2055 Me 

A vertex of the poly- 
hedron is characterized 
by the property that 
(m+ 1) among the func- 
tions Vj» ye vanish, the Fig. 3. Zunovickn’s method 
remaining ones _ being 
positive. From (37) it follows by an elementary discussion that the projection 
of a vertex onto z is characterized by the properties: 

4. (m+4) functions among the functions |7,;| have the same value 9. 

2. The remaining functions |7,;| are So. 

We call such a point of a an admissible point. The algorithm builds up a 
chain of admissible points diminishing monotonically the quantity o= x,,,,. 

Figure 3 is the picture of a special case giving a very good idea of the be- 
haviour of the procedure. We have m=2, n=5, thus 








Nj =4j1%+4jo%2.+C;, 7=1,2,...,5. 


Furthermore we assume 
2 2 
Qji + @jg=1. 


The |7,| corresponding to a given point are therefore the distances from the 
point to the lines 7,=0 
Aj X%y + Ajg%qt+c;=0 (40) 


and an admissible point is the centre of a circle tangent to 3 of the lines and 
intersecting the other ones. The algorithm goes through ever waning circles of 
this type to a final circle solving the Tchebycheff problem. 
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This application of the simplex method to the problem of Tchebycheff is 
exactly the method proposed by ZUHOVICKII [4] without establishing the relations 
to linear programming. We have of course the additional problem (mentioned 
in section II) to find a first feasible solution, that is a vertex of the polyhedron 
or an admissible point in the space z. This can be done by a preliminary routine 
different from the simplex algorithm and outlined in the paper [4]. 


The exchange method. We may solve the adjoint problem (37) also as a 
dual problem of linear programming and we prefer to explain this dealing with 
the following example (m= 2). 


The functions of the Tchebycheff problem are 
m= H+ %, Ne=2Hy+%y—4, Nye=3M+%—3, Me=4% + %2—}- 


The adjoint problem of linear programming is therefore to minimize z= x, under 
the constraints 


Vy =| + %3 = % + %2 + %=O Vi =— mt %=— %— %2+%20 

Yo=Not+ %y=2%y+%+%—420 yr =— et X= — 2%, — %_+ %3+420 
V3 = Ng + %3 = 3%,+ %2+ %—320 V3 = — 3+ % = — 3%, — %2+%+320 
Va =a t+ %3 = 4%, + %2 + %3—3=20 Ve =— gt % = — 4%, — %2+%,4+320. 


The table of the dual simplex method stands as follows. 








v= = w= = R=Y= y= y= s= 
a a wt wu ah us 
wi1 4 14 4 ah Ee oh 'ont e 
(42) 
as + -¢ 3 ‘  & ag 
‘6 ~¢ ~§. «3 .  . +: S48 














As in the general simplex method the coordinates %,, x2, x3 are eliminated intro- 
ducing 3 of the 8 variables y, y* as independent variables. (3 Jordan steps.) 
In the special case at hand it is of course impossible to use only variables y as 
new independent variables because the matrix built by the x-rows and y-columns 
has only the rank 2. 

We made up our mind to introduce yy, yg, yg; the néw table is after cancel- 
lation of the columns 4%, %2, %3 








A= w= wW= W= M=Z= 
Zi~G @ 4 4 FTF 
Vo 1 1 1 0 o| 4 
g] ab -b -1 4 @2 - 
1 : ¢ 4 3 ~853 














We are ready to begin the simplex algorithm because the elements in the last 
column are positive and we know that they remain so during the algorithm. 
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(This has been pointed out at the end of our presentation of the simplex algorithm 
in section II.) If not all the elements of the last column would be positive, 
there is an easy way for repairing this*. But we do not insist upon this difficulty 
which turns out later on to be immaterial. The element at the lower right-hand 
corner of the table is a lower bound for the Tchebycheff minimum: 


% = Ming= 8. 


Before proceeding with the simplex algorithm we give — as we did above — a 
geometric interpretation of the table (43) in the space a of the Tchebycheff 
problem spanned by the coordinate axis x, %9. 

If the variables yf, y,, y3_ on the left side of the table vanish, we get a point 
in the space x, %,, %, and its projection P onto a has the properties 


— == —Ng=— %=—2=—} (44) 


as follows from (41). We introduce now the lines E; in the plane 2 determined 
by the equations 7;=0. Any three of the lines build a triangle called a reference 
using the terminology of the paper [7]. In the general case of the Tchebycheff 
problem a reference is a simplex of the m-dimensional space bounded by (m-+ 1) 
hyperplanes chosen among the hyperplanes E;: 7;=0. We show now that P 
is a point in the interior of the reference-triangle E,, E,, E, corresponding to the 
variables on the left side of the table. The homogeneous parts yy, 2,3. of yy, 
V2, ¥3_ Satisfy indeed the relation given by the last column of the table 


z= X= byt + bet t98 
and putting x,=0 we have 
4(—) + 42+ 4(—7%) = 0. (45) 


n; being the homogenous part of 9;. The coefficients in this linear relation are 
positive (also in the general case) because they are the elements of the last 
column of the table. From (44) it follows that the values of 7, 2,3 at the 
point P have the same signs as the coefficients in the linear relation 


$7 — PUP + 373 =0. 
Hence P is inside the triangle. Furthermore the values of |%,|, ||, |73| at P 
are equal =}. Therefore P is called the centre of the reference and the number } 
is called the reference-deviation. As pointed out above, the reference deviation 
is a lower bound for the Tchebycheff minimum. The values of the remaining 7, 


are obtained by subtracting the reference deviation from the elements of the 
last line: 


mM=Mu—%=§—-F=3, N=%¥s—%=$-F=3, m=M—%—~=4-F=¥ 
—m=¥2—%=$—-F=$, —m=yi-—m=—§-F=-¥. 





* If a y-row has a negative element in the last column, y has to be replaced by 
the corresponding y* carrying out a Jordan step. Inversely a y* on the left side 
of the table and producing a negative element in the last column has to be replaced 
by the corresponding y. 
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The pivotal column for the first step of the simplex method is the column yf 
containing a negative element in the last row. Obviously this column may also 
be characterized by the property that the corresponding value |7,4| is larger 
than the reference deviation. The pivotal row is y3. 

A reader familiar with the paper [7] will certainly be convinced by this 
discussion that this method of solving a Tchebycheff problem is exactly the 
exchange method developed in [7]. it can also be shown that the rule for finding 
the pivotal row stated in section II coincides with the “exchange rule” stated 
in formula (21) of the paper [7]. 

After this first step of the simplex method, the following table is obtained 


= Zz2= 


A= w= %= W= Ve 





vi 
Ye 


* 


Va 
a cok Oe ee ok 

















The new reference is F,, E,, E, and the reference deviation is raised to the 
value 3 and this is the final value because the algorithm stops, the elements of 
the last line being positive. 








/ ral \ \ 


Fig. 4. Exchange method 


Our algorithm builds up a chain of reference centres always raising the 
reference deviation. Figure 4 is again the picture of the special case where the 
|y;| are the distances from the lines E;. The Centre of a reference is then the 
centre of the incircle of the corresponding triangle and the reference-deviation 
is the radius of the circle. The algorithm goes through ever waxing incircles to 
the last incircle intersecting all the lines. 

We have now established our main result about the duality of the exchange 
method and ZuHOVICKII’s method and about the equivalence of both methods 
to linear programs. 
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Reduced tables. It is a serious disadvantage of both methods that the number 
of linear functions is doubled by replacing the Tchebycheff problem by a linear 
program. It is desirable to use only the functions y,; without introducing the ye 
This can be done in many different ways and no attempt is made to describe the 
tedious transformations of the formulas. But in the case of the exchange method 
we may use the formulas already prepared in the paper [7]; they are repeated 
in the sequel for the convenience of the reader. Taking again the example (41) 
we use the reduced table 


m= = 13> Y= 
x,| 1 2 3 4 
X%_| 1 1 1 1 |. (46) 














1] 0 —4 -3 -3 





At first x,, x, are eliminated introducing two arbitrarily chosen 4 as independent 
variables, for instance 7,, 3. The result is 








= 
m ; 2 
m| + @Q). (47) 
1|-$ 3 











A first reference is built composed by the variables 7, 43 on the left side of the 
table and by an arbitrary variable on top of the table, say y,. There is a linear 
relation between the homogeneous parts of the three variables 


AM + Ao + AgH3 = 0. (48) 
From the first column of the table it follows immediately 
A=1, A,=—2, A,=1. 
The reference deviation / is computed by the formula 


bow Aim +4em2 +433 __ A, (— $) ee 


[At + [al + 14s] TAT TAT +145] 4’ 


where the relation (48) has been taken into account. At the centre of the reference 
the variables 7,, m2, 43 have the values 





m=hsgnd= 3, y2=hsgnd4,=— 3, yg =hsgn d=}, 
hence from the column 7, of the table 
m=—tnm+inti=*. 


A new reference is built accordingly to the following rules. 
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First exchange rule. The variable 7; to be introduced in the reference must 
have its absolute value at the centre >|h|. In our case this is mq. 

Second exchange rule. Let be y, the variables of the old reference and 7; the 
new variable (determined accordingly to the first rule). There is a linear relation 
between the homogeneous parts 


i+ Zi Malta = 0- 


If at the centre of the old reference 4n;=0, the subscript o of the variable to 
be removed from the reference is given by the smallest of the quotients u,/A,. 
If hyn; SO take the largest quotient. 


In our case we read from the second column of the table 
a= — 3m + 37s 
thus 
4=4, f2=0, Mg=—F, 
;, 0, one 3. 


hyng=}-4>0 


The quotients are 


At the centre we have 


thus the smallest quotient (— 3) must be taken and 7, is removed from the 
reference. This fixes the pivot as indicated in (47). The new table is 








m= > 
m § $ 
"4 3 5 (49) 
1} —3 —4 











and the new reference 7, 2, 44. The same formulas yield h= # and at the centre 
m=%8, m=—% m=%, m=. 


Because all the 4; are smaller than the reference deviation, the solution of the 
problem is obtained. 

Finally we want to draw the attention of the reader to the following fact. 
The difficulty of constructing previously a “feasible solution’”’ (that is to generate 
a positive last column of a simplex table) has disappeared. Any reference can 
be taken as start for the computation with reduced tables. This advantage of 
the exchange method is not shared by ZUHOviIcKuI’s method. 
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Zum EinschlieBungssatz von Kryloff-Bogoliubov 
fiir allgemeine Eigenwertaufgaben bei gew6hnlichen 
Differentialgleichungen 
Von 
W. WETTERLING 


Der EinschlieBungssatz von KRYLOFF-BOGOLIUBOV fiir selbstadjungierte all- 
gemeine Eigenwertaufgaben bei gewdhnlichen Differentialgleichungen, wie er 
etwa bei CoLLatz [J] fiir volldefinite Eigenwertaufgaben bewiesen ist, soll hier 
in abgeinderter Form auch fiir die bei KAMKE [2] als im engeren Sinne definit 
bezeichneten Eigenwertaufgaben hergeleitet werden. 


Es sei eine Eigenwertaufgabe 


M[y] =AN[y] 
mit 
M{[y] = 2 lie) yx), Ny] = 2 [ee (2) y(x)]” (1) 


gegeben. Dabei sei n<m, f,, g, v-mal stetig differenzierbar in einem Intervall 
[a,b]. Die (linear unabhangigen) Randbedingungen seien 


U,[y]=0 (u=1,2,...,2m) (2) 
mit 
2m-1 
Tul] = 2 [aur ¥(A) + Bur ¥(0)]. 
Die Aufgabe sei selbstadjungiert, d.h. es sei 
b b 
J (uM[v] —vM{[u])dx=0, f (uN[v] —vN[u])dx=0 


fiir beliebige Vergleichsfunktionen « und v (2m-mal stetig differenzierbare Funk- 
tionen, welche den Randbedingungen (2) geniigen). Ferner sei die Aufgabe im 
erigéren Sinne definit: Es soll 


fuM{uldx>o 


sein fiir alle Vergleichsfunktionen auBer « =0, jedoch nicht notwendig volldefinit 
b 
(es wird also nicht verlangt, daB fuN[u]dx>0 ist fiir alle Vergleichsfunk- 


a 
tionen u auBer u=0). Dann gilt der Satz: 
Uo, 4, 4, seien drei Vergleichsfunktionen, die in der Beziehung 


M(u,] —tN[u,] = N|u,_4] (3) 
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zueinander stehen, wobei das abgeschlossene Intervall [0,¢] keinen Eigenwert 
der Aufgabe (1), (2) enthalt. Es sei 


b 
a, = f u,N[u9] dx 

b b 
a, = f u,N[u,]dx = f u,N\u9] dx 


‘ 
a, = fu, Niu,| dx. 


Dann liegt zwischen 


mindestens ein Eigenwert der Aufgabe (1), (2). 


Beweis. Nach KAMKE [2] hat eine selbstadjungierte und im engeren Sinne 
definite Eigenwertaufgabe unendlich viele Eigenwerte A; (i =1, 2, ...), die simt- 
lich reell und von Null verschieden sind. Mehrfache Eigenwerte sollen dabei so 
oft aufgezéhlt werden, wie es ihrer Vielfachheit entspricht. Es sei ¢;=sign A; 
und y,; die zu A; gehérende Eigenfunktion. Zu einer Vergleichsfunktion (x) 
kann man Entwicklungskoeffizienten 


b b 
1 
=e f ¥N[w]dx => [ yeM[ulax 


definieren. Sind u(x) und v(x) zwei Vergleichsfunktionen mit den Entwicklungs- 


koeffizienten c; und d;, so konvergiert die Reihe >) ¢,c,d;, und es gilt (s. [2]) 
die ,,Parsevalsche Gleichung“ sak 


tae 


€;¢;4; =f uN{o]dx=foN{w) dx. (4) 


Wenn man Gleichung (3) mit y; multipliziert und iiber [a, b] integriert, findet 
man (vgl. [4]) 
(JA| — #4) =e,c-9 (= 1,2), 


wobei c) die Entwicklungskoeffizienten von u, sind. Somit ist 


0 (0 
Ares Se 
PT gee 


Nach der Parsevalschen Gleichung ist 


=2e (a— 1)" 


Nun werden die Hilfsfunktionen y = a,4,— 43%) und gy =a,u,— au, eingefiihrt, 
fiir welche 





_ (ce) ia P 
(k = 0,14, 2). 


b 
SyN[g] ax = a3 a, - 2.a§ ag + a3 a, = Ay (a3 4a, — a3) 


2* 
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gilt. Die Entwicklungskoeffizienten von y sind 
(0) 
At a5." 
clo) c) 
as a3 
(a; — 4)? Aj—t 
Die Parsevalsche Gleichung (4) ergibt 


b co Cc 
fontglax= Se, eX fa,— (4-0) a? 


a 





diejenigen von 





Hier steht rechts eine Reihe mit nichtnegativen Gliedern; denn es ist ¢;= sign A;= 
sign (A;— ¢) nach den Voraussetzungen des Satzes. Es gibt nun mindestens einen 
Eigenwert A,, fiir den [a,.— (A;— ?) ag]? seinen kleinsten Wert annimmt. Man kann 
daher abschatzen: 


b 
Gz (434, — a5) = f pN[g] dx 


(0))2 
> (A, —¢ Me 
— [a — ) a)? a &; (4, — o> t)3 

= [a_— (Ay — t) ag]?- ag, 
wobei a,>0 ist. Darum kann man auf beiden Seiten der Ungleichung (5) durch ag 
dividieren: 


(5) 


(a3, — a§) => [a, — (Ap — #) ag]? 
Hieraus folgt 


p42 _ fm _ (Pag <4 4 [ie (ae). 
ta, as (2) Saset+ St as By 
In der Formulierung des Satzes und beim Beweis wurden nicht die Schwarzschen 


a 


Quotienten w.= “* und yvs= “* benutzt, da a,=0 sein kann. Ferner wurde 


beim Beweis nicht der Satz iiber die Entwicklung von Vergleichsfunktionen nach 
Eigenfunktionen benutzt. 

Zu bemerken ist noch, daB man fiir die Anwendung des Satzes drei iterierte 
Funktionen 4, 42, 4, bendtigt, wahrend man beim EinschlieBungssatz fiir voll- 
definite Eigenwertaufgaben nach [J], S. 188, mit zwei iterierten Funktionen 
auskommt. 

Beispiele: 

1. —y"=Axy, y(—1)=y(1) =0. 

Diese Eigenwertaufgabe ist ersichtlich nicht volldefinit. Man erkennt, daB mit 
A; auch stets — A; ein Eigenwert ist. Fiir die im obigen Satz auftretende Zahl ¢ 
wurde ¢=0 gewahlt, und es wurden, ausgehend von der Funktion 


Uy = — *8—x?+ 441, 
iterierte Funktionen 4, und “, bestimmt: 
60 u, = 2x8+ 3x5— 5x4 10284 72x43 
30240 u, = — 14x® — 27 x8 + 602x7+ 168 x — 294 x4 — 252%3+ 206% 4+ 153. 
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Die Zahlen a, sind a,—0,02462722, a,=0,001 862829, a,=0,0001495740. Der 
EinschlieBungssatz ergibt, daB zwischen 9,365 und 15,543 mindestens ein Eigen- 
wert liegt. Diese Schranken sind recht grob. Das liegt daran, daB die Schwarz- 


a ap ‘ p — . ” 
schen Quotienten 1h hier nicht, wie in den meisten anderen Fallen, gegen den 
k 


dem Betrag nach kleinsten Eigenwert A, konvergieren, sondern, wie aus Kon- 
vergenzbetrachtungen dhnlich denen in [J], S.192, hervorgeht, die Folgen 
26-1 und —“2¢_ gegen verschiedene Zahlen konvergieren, deren geometrisches 


Aep _ Gapti 
Mittel A, ist. 


Geht man von ug (x) =u%)(— x) = x3— x®— x +1 aus, so erhalt man Schranken 
fiir eine negativen Eigenwert, namlich —15,543 und — 9,365. An diesem Beispiel 
erkennt man auch, daB der EinschlieBungssatz von KRYLOFF-BOGOLIUBOV in der 
Form, wie er fir volldefinite Eigenwertaufgaben bewiesen ist [J[, fiir diese 
Eigenwertaufgabe nicht richtig sein kann, denn es wiirde sich bei der Berechnung 
der Wurzel fiir die Eigenwertschranken ein negativer Radikand ergeben. 


2. Die von K. Karas [3] angegebene, bei der Biegeschwingung einer Welle 
mit Beriicksichtigung der Kreiselwirkung auftretende Eigenwertaufgabe 


yV=A(y+ey"), (0) =y(1) =y"(0) =y'(1) =0 

(m )* 
1—c(nap 
definit. Fiir c=0,09 ist A,=—610,46232. Ausgehend von uy=x —10x8+ 
15 x4— 64° wurden iterierte Funktionen u,, u,, “3, “4, 4, und Eigenwertschranken 
berechnet: 





besitzt die Eigenwerte 4, = . Fiirc< 4 , ist die Aufgabe nicht voll- 


— 616,0465 SAS — 605,3197 aus u,u, und u%, 
— 612,1987 [AS — 608,7717 aus ,,u, und uz, 
— 611,0175 SAS — 609,91149 aus uy, ug und u,, 
— 610,6415 SAS — 610,2837 aus uy, u, und u,. 


Diese Rechnungen wurden auf der IBM-650-Rechenanlage des Instituts fiir Ange- 
wandte Mathematik der Universitat Hamburg durchgefiihrt. 
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Estimation of Iterated Matrices, with application 
to the von Neumann condition 


By 


TosIo KATO 


Introduction 


Let A be a linear operator in a finite-dimensional vector space X. If a norm 
is introduced into X so that X becomes a Banach space, the norm of A is defined 
as usual by ||A|]=max||A u||/||u|| for O--uEX. It is well known that (see 
DUNFORD and ScHWaRTZ [2], p. 567 or HILLE and PHILLIPs [4], pp. 124—125) 


_ — n\\1/n on 5 n||1/n 
(1) r=r(A) = Jim ||A" "= inf |A"| 
exists; 7 is called the spectral radius of A and is equal to the largest of the absolute 
values of the eigenvalues of A. 


It appears, however, that little is known about the growth rate of the quan- 
tity ~,=||A"||/r" for n—> co. Estimating p,, in particular uniformly for a certain 
class of A, is an important problem in connection with the question whether the 
von Neumann condition is sufficient for stability in the finite-difference methods 
in solving partial differential equations (see LAx and RICHTMYER [5] and RIcHT- 
MYER [6]). The object of the present paper is to deduce such estimates in terms 
of ||A|]| and certain other parameters related to the spectral properties of A. 
In particular we are interested in the conditions under which 4, is uniformly 
bounded. 


It should be noted at this point that the problem does not essentially depend 
on the particular norm employed. Since the norm of operators as defined above 
has the properties of a norm in the finite-dimensional vector space B consisting 
of all linear operators in X, all such norms are equivalent. In order to estimate 
p,, it is therefore sufficient to assume a particular norm from the outset. For 
convenience we shall employ a Hilbert norm and, consequently, we shall hence- 
forth assume X to be a unitary space (finite-dimensional Hilbert space). We 
denote by (u, v) the inner product of u, v€ X; the norm is given by |||| =(w, «)!. 


Roughly speaking, our main results may be summarized as follows. #,, is 
bounded for m=1, 2,... and for a set of A if ||A|| is bounded and if the larger 
eigenvalues (those with absolute values near to 7) of A have simple elementary 
divisors and are either separated from each other by a fixed amount or the associated 
normalized eigenvectors have the Gram determinant larger than a fixed number. In 
particular, it follows that the von Neumann condition is sufficient for stability 
if the amplification matrix satisfies the stated assumptions, a result extending 





Estimation of Iterated Matrices 23 


at once the two sufficient conditions given by LAx and RICHTMYER [5]. More 
detailed and general results will be found in Theorem 1, 1a, 2 and the remarks 
thereafter. 


Results 


Before stating our results, we recall some relevant notions of spectral theory 
(see DUNFORD and ScHWARTz [2], Chapter 7, and HaAtmos [3]). Let dim X = 
m<oo and let 4,,...,4,, sm, be the (distinct) eigenvalues of A; the set 
{A,,..., A,} is the spectrum of A. The set of all w€X such that (A —A;)"4=0 
for some integer » forms a subspace M; of X, which will be called the algebraic 
eigenspace of A for the eigenvalue 2;. The smallest for which this is true for 
all «€ M, is called the index of A; and denoted by »;. The dimension m;= dim M; 
will be called the algebraic multiplicity of A;; m; is equal to the multiplicity of A, 
as the root of the characteristic equation det (z— A) =0. We have 


(2) mm, isv,Sm,, 4=1,...,8 


A vector “=O is an eigenvector of A for A; if (A —A;)u=0. The spectral radius 
of A is 

(3) r= max|A;| S|[Al- 

The positive number 

(4) 6; = min | 4; — A,| 


will be called the separation distance of A,. 
Our first theorem now reads 


Theorem 1. Let A be a linear operator in a unitary space X, dim X =m< oo, 
with spectral radius y>0. Let J" be a simple, closed rectifiable curve with length / 
lying in the circle |z|<7 and having a distance «>0 from the spectrum of A. 
Let there be g (distinct) eigenvalues /,, ..., 4, of A outside I (which will be called 
the outer eigenvalues) and let », and 6, be the index and the separation distance 
of A,, respectively, for these outer eigenvalues. Then 





A*ll < m1 
Pn = y” — Ss (|Al| +7, 
” \|.4 m+me—2 f (vp, 2) 7 
+OG J" “(ll l|+7+- ) es a 6 = 4, 2, 3, ..-. 
where /(»,, ”), the only factors in (5) depending on m, are given by 
= n coe of " 
(6) Hy,n) =14+(T) +--+") 


(we set o =1 and (;) =0 for j< 0) ‘ 


Remark 1. If only a single matrix A is considered, we could take as I’ a 
circle |z| =r’ with r’ just smaller than 7; then the outer eigenvalues would 
consist only of the eigenvalues with absolute value r. It may well happen, 
however, that a ditferent choice of I’ leads to a sharper estimate of ~,, especially 
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when a set of matrices A is considered. J" need not be fixed for all A of the 
set, but its length / should usually be bounded if the estimate (5) is to be uniform. 
It should be noted that all eigenvalues may be included among outer eigenvalues ; 
then we can simply set / =0 in (5) or (7) below. 

Remark 2. (6) shows that p, is of the order n’~*, where » is the largest of 
the indices of the eigenvalues with absolute value 7 (see Remark 1). Since yom 
in any case, p, never grows faster than n”~'. In particular ~, is bounded for 
n—»oo (for a single matrix) if all the eigenvalues with absolute value 7 have 
index one. In case there are many matrices to be considered, the following 
corollary to Theorem 1 is useful. 


Theorem 1a. In Theorem 1 let all outer eigenvalues have index one (in 
particular this is the case if all outer eigenvalues are simple). Then p,, is bounded as 


(7) pn (lA +7+3)" [sta ta(G) |, 2H 1.2... 


22a™ 6 

where 6 is the smallest of the separation distances for the outer eigenvalues. 

Theorem 1a gives a useful sufficient condition for p, to be uniformly bounded. 
But it becomes useless it one or more of the outer eigenvalues approach each 
other indefinitely. A sufficient condition for boundedness which is useful even | 
in such a case is given by Theorem 2 to follow. To state it we introduce the 
notion of the condition number * for a set of vectors u,,..., 4, of X. Let B and y 
be respectively the maximum and the minimum of the homogeneous function 


(8) || ty 4) + +++ +t, y|[/([4]? + --- + [¢,|?)4 


of the complex variables ¢,,...,¢,. The ratio w =£/y will be called the condition 
number of the set %,...,4,. It is easy to see that 6? and y? are respectively the 
largest and the smallest eigenvalues of the Gram matrix ((u;, 4); p<1,....n° 
Theorem 2. Let m, A,r, J’,1 and « be as in Theorem 1, and let all outer 
eigenvalues of A have index one, so that each outer eigenvalue A, has as many 
linearly independent eigenvectors as its algebraic multiplicity m,. Let u,, ..., U», 
m'=m,-+----+-m,, be a set of such m’ independent eigenvectors normalized by 





|| ~;|| =1, 7 =4, ..., m’, and let w be their condition number defined above. Then 
— ||A* ll (wm + 1) m-1 


if there is at least one inner eigenvalue of A (so that /=>2z«), and 
(9") p,So 


if there is no inner eigenvalue (so that m’=m). 


Remark 3. The condition number w can be estimated in terms of the Gram 


determinant D = det ((u;, u,)) of the m’ eigenvectors under consideration, accord- 
ing to 





(10) w < 2/D3. 
* The condition number is usually defined for matrices. The condition number 
of a set of vectors “,,..., u, as defined here is equal to the square root of the so- 


called P-condition number of their Gram matrix. 
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(10) is a direct consequence of Lemma 2 of Appendix; note that the trace of the 
Gram matrix in question is equal to its order m’ in virtue of the assumed nor- 
malization ||;|| =1. 

Remark 4. Theorem 2 leads directly to both Theorems 2 and 3 of Lax and 
RICHTMYER [5]. (9’) is essentially identical with the estimate underlying their 
Theorem 2, but the constants are improved considerably (see Remark 3). 


Remark 5. In order to obtain a good estimate of ~, by (9) or (9’), the m’ 
eigenvectors u; should be chosen ,so as to make w as small as possible. w may 
well be bounded even when two or more of the outer eigenvalues approach each 
other indefinitely. On the other hand, Theorem 1 or 1a is more convenient for 
application if the separation distances ot the outer eigenvalues have a positive 
lower bound 46, for then we need not consider the eigenvectors at all. 


Remark 6. Actually we could even generalize both Theorems 1a and 2 to a 
single theorem in which the outer eigenvalues are divided into two groups, the 
conditions of Theorem 1a being assumed for one group and those of Theorem 2 
for the other. We shall not do this explicitly here, but it would be obvious how 
to formulate such a generalization. 


Remark 7. Our results lead to the following generalization of Theorems 2 
and 3 of LAx and RICHTMYER [5]. Let the amplification matrix G(4?, g(42), k) 
have bounded elements and let its eigenvalues satisfy the assumptions of our 
Theorem 1a or 2 (or, more generaly, see Remark 6 above) where the constants 
1 and « or 6 are independent of ¢ and k. Then the von Neumann condition is 
sufficient for the stability. For details see LAx and RICHTMYER [5] and RICHT- 
MYER [6]. 

Proof of the theorems 
Proof of Theorem 1. The operator A admits the spectral decomposition 


(11) A= 3B +N). 
j= 
Here 4,,..., A, are the outer eigenvalues and A,,,,..., A, are inner eigenvalues; 
P’s are (oblique) projections satisfying the relations 
(12) P=1, P.P,=6;,8, j,k=1,...,8, 


and PX =M,; is the algebraic eigenspace for A,; N;’s are nilpotent and commute 
with P,: 


(13) N}i = 0, N; P, = BN, = 6;,N,, j,& =1,...,8. 
We write (11) in the form 
q 
(14) A =Aot Ab 
where 
(15) Ay= 2 AF +N), A,=4,P,+N,, kSq. 
i=q 


It follows from the relations stated above that 


q 
(16) A*=A§+)> 4}, % = 41, 2,3,.... 
k=1 
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We first estimate ||A6||. Since it is known that 





n_ 1 — 
(17) At = ser & A)A2" dz, 
r 
we have 
(18) Ass i *max||(z — A ya. 


For any z, however, we have the estimate 
(19) I|(2 — A) || S (l2| + lap */ TT 2 —4,|", 
j= 


where m;=dim M, is the algebraic multiplicity of 4,;; (19) follows immediately 
from Lemma 1 of Appendix. 
Since J7|z—A,|" =a” and at for z€I" by hypothesis, (18) and (19) give 


~~ ly 
(20) lass dali + 
We next estimate ||A}||, kg. It follows from (15), (12) and (13) that 
(21) * 4" P + ae 1N, +: +(, * att 
But it is known that 
aR a 
_ 2n1 fe A) dz, 
Tk 
where Jj, is the circle |z — A,| = 6,/2; thus 
6 Op 
(22) lls max |e — 4) (F)" (lal +r +4)", 


where we have again used (19), noting that /7|z—A,|""= (2)" and |z| <r+ a 
for z€ I, by hypothesis. 
On the other hand N, and its iterates are given by 


N,=(A—A)P,, Ne = (A — A)" B, 
as is seen from (11), (12) and (13); hence 
(23) Mel S Al] + | Ae Pel 
It follows from (21), (22) and (23) that 
m—1 
Atl s(t BP [Lal + (4) Lael All + Lal) + + 
(24) + (7 4) baa (LAI + Lab] 


<(fJ rm (late SNM), 


where /(», ”,) is given by (6). 
The desired estimate (5) now follows from (16), (20) and (24). 
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Proof of Theorem 2. Following the notations of the proof of Theorem 1, we 
have (see (14)) 
q 
(25) A=Agta’, A= DA, A= ABHA™, [Atl] SABI +4". 
For || Ag|| we use again the estimate (20). To estimate ||A’”||, we set 
q 
(26) P=) RB; 
k=1 


P’ is an oblique projection on M’=M,+---+M,, dim M’=m’. Note that we 
have 
A’=AP=PA=A'P=P'A’, A'*=A'*P’. 


Since u,; are eigenvectors of A for the outer eigenvalues, u;€ M’ and we have 
(27) A’u;=Au;=pjyuj, 7 =41,...,m', 
where /4,,..-, Mw 1S an arrangement cf /4,,..., 4, with each A, repeated m, times. 
For any u€ X, P’u belongs to M’ so that we have the expansions of the form 
m’ m’ 
P's = 24 mj, A’*« = A’* Prud, =p; tu;. 
a = 
Recalling the definition of 8 and y as the maximum and minimum of (8), we 
obtain 


m’ 
IPrure rd 4 
& 
m’ dl 
A" al? SBD [ap |? SB?" DG. 
[= am 


This gives || A’"«|| < (7"B/y) || P’u || <7" || P’|| || «|| and hence 
(28) A" |S 7a] P’. 


If there is no inner eigenvalue of A, we have Ayg=0 and P’=1. Thus (25) and 
(28) prove (9’). 

If there is at least one inner eigenvalue, it remains to estimate || P’||. To 
this end we make use of the relation* 


_ ae 4 -_ aia 
1 P= f(z A)"dz, 





whence we obtain as in (20) 





' l l «ileac 
p’ - a | ~j 4 fe m—1 


Since we have assumed that 0+ P’+1, however, it follows from Lemma 4 of 
Appendix that || P’|| =||1— P’||. Hence 

, l \m—1 
(30) |’ See (Al) +" 


The desired result (9) now follows from (25), (20), (28) and (30). 








* For the elementary results of the spectral theory of linear operators used here 
and above, see e.g. DuNFoRD and ScHwartz [2], Chapter 7, and Hatmos [3]. 
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Appendix 


In this appendix we prove some lemmas which are used in the text and 
which are also hoped to be of independent interest. 


Lemma 1. Let T be any linear operator in a unitary space X with the inverse 
T . Then 


(A1) 77 S71" Vdet 7], m =dim X. 


Remark. This estimate is the best possible of its kind, as is seen from the 
trivial example T =1. 

Proof. We have the canonical decomposition T= UH, where U is a unitary 
operator and H is a positive-definite, symmetric operator. Then T41=H71U 
and, considering the unitarity of U and U+, we have || T || =|| ||, || 7+ || =||H>], 
|det 7| =detH. Therefore, it suffices to prove (A1) for T replaced by H. Let 
0<hShyS---<h,, be the (not necessarily distinct) eigenvalues of H. Then 
||| =,,, || 4-4 || =1/A, and detH =h,... h,, so that (A1) is obvious. 

Lemma 2. Let H be a positive-definite, Hermitian matrix of order m and 
let B and y be the largest and the smallest eigenvalue of H, respectively. Then 
(det and tr mean determinant and trace, respectively) 

B 4 trH \m 
+<GatsT: 





Remark. The constant 4 cannot be improved in general. One can easily 
construct an example showing this for m=2 and then also for any m=2 by 
formation of direct sums. 


Proof. Let 
(A2) O<y=hShS---hy =B 


be the eigenvalues of H. Then detH =/,...h,,, tr H =h,+---+h,, so that 
Lemma 2 is a direct consequence of 





Lemma 3. For any m positive numbers /,,..., ,, satisfying (A2), we have 
hon 4 hy tices + lm \™ 
(A3) : a oe m :* 


Remark. The same remark as for Lemma 2 applies. 
Proof. (A3) is implied by 


- fm 5 lim \m 


-m—1" 2 6 2 = 





’ 


m 


which is nothing but the well-known inequality for arithmetic and geometric 
means *. 

Lemma 4. Let P be a (oblique) projection (P?= P) in a Hilbert space such 
that 0+ P+1. Then 


(A4) |1—Pll=||Pl] (21). 





* The original proof of the author was much longer. The simple proof given here 
is due to the members of our laboratory. 
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Proof. Since 1— P is also a projection with 0 -+-1— P +1, it suffices to prove 
(A5) 4 — PS |Pl=t. 


Since P?= P and P 0 imply || P||=1 and similarly ||1— P|| 21 holds, (A5) is 
obvious if || P|]=1. Hence we may henceforth assume that ||P||>1. For any 
a such that 1<a<||P||, there is a «+0 such that || Pu||=al|«||>0. Set 


(u, Pu) Pu. 


v=—-4141- 
|| Pu ||? 


Then (v, Pu) =0 and so 
(4, Pu)? 
P\lull? ~ 


Since (4— P)v =(1— P)u +0 in virtue of || Pu|] >a||u|| >|| «||, we have v +-0 and 


a— Pol? Ss (4 — P) ue |fP[Pull? ~ [lPull? ~ oe. 
Ilo}? []ee |? [Pa]? — (ue, Pr)|? ~ lef? 


Ilo? = lle? — 





(A6) 


the middle inequality of (A6) follows from 
Il? [I(4 — P) « |]? — []] |]? |] Pa |? — [(m, Pm) |?) 
= [||~||? — |(u, Pw)|]? + 2||«||?[|(u, Pu)| — Re(u, Pu)]=0. 
(A6) shows that ||1— P||=a. Since a was arbitrary if 1<a<||P||, this proves 
A5). 


References 


[1] Courant, R., and D. HiILBEert: Methods of Mathematical Physics, Vol. I. New 
York 1953. 

{[2] DunForpD, N., and J. T. Schwartz: Linear Operators, Part I. New York 1958. 

[3] Hawmos, P. R.: Finite-Dimensional Vector Spaces. Princeton 1958. 

[4] Hite, E., and R. S. Puiuips: Functional Analysis and Semi-groups. Amer. Math. 
Soc. Colloquium Publ., Vol. XX XI, 1957. 

[5] Lax, P. D., and R. D. RicHtMyeEr: Survey of the stability of linear finite dif- 
ference equations. Comm. Pure Appl. Math. 7, 267—293 (1956). 

[6] RicutmMyeEr, R. D.: Difference Methods for Initial Value Problems. New York 
1957. 


University of Tokyo 
Department of Physics 
Bunkyo-Ku, Tokyo 


(Received August 11, 1959) 











Numerische Mathematik 2, 30—41 (1960) 


Numerical methods and existence theorems for ordinary 
differential equations 
By 
T. E. HULL and W. A. J. LUXEMBURG 


1. Introduction and summary 


We shall be concerned with existence theorems for 
x’ = f(t, x), X (to) = Xp (1) 


where the function / is required to satisfy certain conditions. 


To prove such theorems one must first introduce a method, such as the 
method of successive approximations or the Euler polygon method, to generate 
“approximate solutions’. Then one investigates the convergence of these ap- 
proximate solutions. 

Our purpose is to consider whether more sophisticated methods, such as are 
useful in numerical calculations, can also be used to generate suitable approximate 
solutions. The problem is relatively trivial for Runge-Kutta methods, and so 
we do not consider them. It is however considerably more difficult for multi- 
step methods, but we believe our results are essentially complete for this case. 
Our proofs differ in some important respects from those of CAucHy, LIPSCHITZ 
and PEANO, but are otherwise similar in spirit. 

We first introduce the notion of stability of a numerical method relative 
to a class of functions. We then establish necessary and sufficient conditions 
for a multi-step method to be stable, relative to the class of functions / satisfying 
certain continuity and Lipschitz conditions. We next introduce the notion of 
consistency, and establish necessary and sufficient conditions for a multi-step 
method to be consistent. The existence theorem is then proven with the aid 
of the approximate solutions generated by any stable, consistent multi-step 
method. 

All the above results are then generalized to the case where the Lipschitz 
condition is replaced by the Osgood condition. 

Finally we consider the case where / is required only to be continuous. Here 
again the existence theorem (t.e. Peano’s theorem) can be established, but only 
with the aid of a smaller class of multi-step methods. The notion of stability 
loses its significance in this last case. 


2. Numerical methods and stability 
A numerical method is designed to yield approximations at t=fp, t,, te, ... 
to the solution x(¢), if any, of problem (1). For simplicity we shall assume that 
bait; for ¢=(, . : eee 
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It is convenient for our purposes to introduce the approximations in two 
stages. First, “intermediate’”’ or “idealized’”’ approximations y, at ¢, are intro- 
duced, and defined recursively by difference equations of the general form 


Vn = (tortrs -+-r bn» Xo» Vor Var -++> Vn) (2) 


As indicated by the notation, (2) may define y,, only implicitly. (These difference 
equations are designed to approximate (1), but whether they do or not is not 
relevant for a consideration of stability.) As a typical example of (2) we could 
have Yyy=%9 for n=O, a truncated Taylor series for n=1, 2, and a formula 
like (5) below, with k=3, for n=}, 4,.... 

The second stage in the approximations arises because (2) cannot be solved 
exactly. We obtain only an approximation to y,, which we denote by z,,, and 
which satisfies 


By 2 B (bg, bys 000 bes Bq, Ses Sys +++ Bq) + % qe (3) 


The value of the “local error’, 7,, will depend on the details of the rounding 
procedures, and, in the implicit case, on how the needed iterations are started, 
and terminated. We lose no generality in assuming that the values of ¢ are 
known exactly. 

We shall consider that the numerical method is defined by (2). A particular 
procedure associated with (2), such as is defined by a computer program based 
on (2), will however yield results which satisfy (3). 

Carrying more decimal places, changing rounding procedures, and so on, 
will yield a new approximation Z,, which satisfies 


Tn, = G (be, by, «++ bgs Bes Ber Fas +000 8n) +% n> (4) 

We are now in a position to define stability. We shall say that a numerical 
method is stable, relative to a class of functions, if for each member of the class 
and for any e>0 there exist 6=6(e) > 0, 4, T such that |z, —z,|<e whenever 


»|",—7;| <6, the result being required to hold uniformly for max (t;—#;_;)</o 
i=0 


and for all such that 4,<T. 


Thus, roughly speaking, a method is stable if it leads to results which are 
insensitive to small changes in the local errors. 

It should be emphasized that stability is a property of the numerical method, 
relative to a class of functions. It has essentially nothing to do with how well 
(2) approximates (1). 


3. The stability of multi-step methods 
We now want to establish necessary and sufficient conditions for a multi- 
step method to be stable. The theorem to be proven was given originally by 
DAHLQuIsT [1]. We present our version because the proof is simpler, and because 
part of the argument will be used again later. 
A numerical method is called a multi-step method of order k if t;,, —t;=h 
(¢=0), and if equation (2) with »+ in place of » can be written in the form 


Le Vntk + %e—-1 Vnte—-1t* +H Vy 
- A(Bef (tren Yak) + Beal (tnse—1> Yn¢e—1) + °° + Bol (tn, Yn) (5) 
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for n=0, where a; and £; are constants which are assumed to be real, and where 
a, +0 and either a) or By=+=0. The values of yo, ¥,,..., Vz, are defined by some 
other formula, such as a truncated Taylor series or a Runge-Kutta formula, 
which can easily be chosen so as not to interfere with the stability or, for later, 
the consistency of the method. 

We shall consider the class of functions / which are continuous and which 
satisfy a Lipschitz condition of the form 


If, x) —f(,%)|SL|x—2| (6) 


in a region of the ¢—.x plane which contains all the points being considered. 


We shall say that the multi-step method satisfies the ‘‘root condition”’ if 
the roots of 


k . 
a (g) = 2% o" 


lie in or on the unit circle in the complex g-plane, and are simple if on the circle. 
We prove the following: 


Theorem 1. A multi-step method is stable, relative to the given class of functions, 
if and only if it satisfies the root condition. 

If the special cases of equations (3) and (4) which correspond to the multi- 
step formula (5) are subtracted, and if we put e, =z, —z, we obtain 


Op Cnt e+ Ope ppp—1 + °° + My ly 
=N (Bef (trans 2n+n) — Bel (tran 2n+e) +771) +R (nte — Tuten) 


We shall denote the right side by G,,. As for the equations defining ¢é, ¢,, ..., ¢,—, 
we need only require that they be such that 


k-1 k-1 - 
dlels kK XI|"4—7;| 
1=0 1=0 


for some K independent of 4 when hSh/y. This is easily done for any h, but 
we shall later introduce a particular value of h, for which we want the inequality 
to hold. 

To prove the necessity of the root condition we meed only consider the case 
where / (¢, x) =0 and where 7;—7;=0 (t{=h). In this case G,, =0, and e, is simply 
a linear combination of & linearly independent terms of the form n”9” for m=0, 
1,2,... and where @g is a root of «(@) whose multiplicity exceeds m. Unless the 
root condition is satisfied, at least one of these terms will approach infinity in 
absolute value as ” approaches infinity. We can always arrange to have the 


coefficient of this particular term not equal zero, by a judicious choice of é9, ¢,, ..., 
k-1 

€,-,, regardless of how small we take >) |e;|, and hence no matter how small 
k-1 i=0 

we take >) |7;—7;|. The root condition is therefore necessary for stability. 

i=0 


For the sufficiency we first use Duhamel’s principle to obtain 


n—k 
2D %,_;16,+¥%, n=k 
é, = i=0 


Y 


n? 


n<k 
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where @, is defined by 
Ot, Dy sp + Op -1 Dy pa +++ tag PD, =0 


and 
D,=9,=---=D,2=0, BD _,=a;,", 
and where the ‘‘particular’”’ solution is 
k—-1 k-i-1 
Y, => ( 2 tp Pu n—s—j-1) n=k, 


as can be verified directly. 
When the root condition is satisfied we have max|®,|=@®< oo and thus 


k-1 k-1 
\Wilska®@D> |e,| Ska DK D'|7,—7,|. 
i=0 i=0 


Here «= max |q, |. 
Only the term ®,_,G,,_, on the right side of (7) contains z, and z,. Using 
the Lipschitz condition, and putting 8 = max|f,| we can therefore obtain from (7) 


n—1 n k-1 
le,|<ABOL|ec,|+ AB OL (k+1)>d|e,|+a8> |r,—-7,|+ ka OK ¥ |r,—7,. (8) 
i=0 t=k i=0 


Let us now keep Ah, where h,B®L<1. We then have |e,|<£, where 
E,= BC and otherwise E,, satisfies the dominating difference equation 


E,=hA(E,-1+Ey-.+---+E) + BC, 
with 
(1—hBOL)A=BPOL(k+1), (1—A POL) B=max(«D, ka PK, kK), 


C= ba |r; —7,|. 
i=0 


Solving for E,, we obtain 


E,=(1+hA)"BC = (1+hA)4~4* BC, 
so that 
le,| SE, < e4™BC. 


Thus, for any choice of TZ#,,, the stability criterion will be satisfied by choosing 
6=e/(e47 B). Theorem 1 is now established. 


4. The consistency condition 


Stability does not imply convergence of the approximations, as h—>0, to a 
solution of the original problem. To illustrate this fact we can consider the 
simple problem x’=0, x(0)=1, and use the ridiculous but nevertheless stable 
numerical method defined by yy=1 and y,,,,=24,.- 

However, stability does ensure that two approximations, like z, and Z,, will 
not differ very much for any ». We might therefore say that stability ensures 
a good global behaviour of the numerical method. 

Some other condition is needed to ensure a good local behaviour of the 
method. We call the condition the “‘consistency’’ condition and we define it 

Numer. Math. Bd. 2 3 
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in the following way. We consider the general numerical method (2) and put 
h=max (t;,,—t,). Then we say that method (2) is consistent with problem (1) 
if (2) is satisfied by any solution of (1), except for terms which are 0 (h). 


For multi-step methods we take any solution x (¢) of (1) and substitute 
x(t, + ih) = x(t.) +2'(,)ih+0(h) for Yn4s, 
and 
x(t, + th) =x'(t,)+0(1) for f(t,si, Vase)» 


in (5). Then, after equating coefficients, we obtain 


co 


Me 


k 
a=0, | toy = 2B; 
t= 


1=0 +1=0 


ll 
i] 


which we call the « — conditions. 
We assume that the equations defining yo, y,,..., ¥,—, are consistent. 
We therefore have the following: 


Theorem 2. A multi-step method is consistent with the differential equation 
problem, if and only if it satisfies the « — B conditions. 

The first of these conditions means of course that one of the roots referred 
to in the root condition must be unity. An interpretation of the second condition 
will appear in the next section in the derivation of equations (16) and (17). 


5. The Cauchy-Lipschitz theorem 


Our plan now is to consider the role that can be played by numerical methods 
in the proving of existence theorems. In this section we consider the following 
well-known theorem of CAUCHY and LIPSCHITZz: 


Theorem 3. If f(t, x) is continuous and satisfies the Lipschitz condition (6) 
in a region containing the point (ty, x9), then there exists a unique solution to problem 
(1) over a suitable interval containing ty. 


In his original proof, CAucHy (see [5]) assumed in place of the Lipschitz 
condition that /(¢, x) had a continuous partial derivative with respect to x. 
Much later LipscuitTz [4] established the theorem in the more general case. 

Both Caucuy and LIPscHITz proved existence by establishing the conver- 
gence of the Euler polygons which are generated by the numerical method 
defined by yy>= 4%, and y,.,—y,=Af(t,, y,). Our purpose is to show that any 
stable, consistent multi-step method can be used to prove existence, although 
usually in a smaller interval. We shall see that our proof differs in an essential 
way from the standard one. At the same time we shall see that the standard 
approach is successful simply because it uses a method which is both stable 
and consistent. Unstable or inconsistent methods are obviously not appropriate, 
so our results imply that the stability and consistency of a multi-step method 
are necessary and sufficient for the proof of existence. 

Uniqueness can be established in the usual way, so we shall concern ourselves 
only with the proof of existence. We consider a stable, consistent numerical 
method defined by (4), where the initial values are obtained in an appropriate 
way (see (9) below). We show first that, in a suitable region, such a method 
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does uniquely define the numbers y,, ¥,41, Veio,---- We then show that 
Vn+1—Vn=O0(1) ash—>0. We then define the polygon 9 (¢) by linear interpolation 
between the points (¢;, y;) and show that |p (t) — p(t’) |S C|t—?’| + 0(1) for some 
C. Finally we show that 9 (¢) converges to a solution as h—0, and thus establish 
existence. 


We suppose that yo, y,, ..., Vz—, are determined so that 


; lvi— vol SNA, 1=0,1,...,k—-4, (9) 
and this defines N. 
For simplicity we take the region R, over which f(t, x) is continuous and 
satisfies the Lipschitz condition, to be (the rectangle) |¢—¢)|<a, |x — x |<. 
It will be convenient to introduce the slope 


s = max (b/a, ®(k+1)(8BM+aN),N), 


and to define the interval J, over which we shall prove theorem 3, by OS¢—t,<a’ 
where a’ =)/s. 

For the first step in the proof we again use Duhamel’s principle and rewrite 
(5) in the form 
k—1/k—-i-1 


P| ps tj Py nij-) (y,— Vo) » n=k, (10) 
=1'‘ j7=0 


n—k 
Yn — Yo = 2.9 y-i-16i a 
where @; is defined as above (for (7)) and where 


B= (Bel (tice Viren) +77 + Bol (t,, ¥))- 
The first « —f condition was used in the derivation of (10). 
From (9) we now obtain, for any (¢,, v,),..-, (t,, ¥,) in R, 


l¥n— Yo] S (m —k +1) OAB(R+1)M+hRa ONh 
<@O(k+1)(BM+aN) nh (11) 
<b. 


The existence of y,, for any ¢, in J now follows by induction. For, suppose y; 
exists and satisfies |y;—y9|<sih for i=k, k+1,...,2—1. We shall show that 
the same is true fori=n. If we put y,—v¥.9+0, the left side of (10) becomes b 
while the right side becomes less than or equal to 6. On the other hand, if we 
put y,—¥V)—0, the left side becomes — 0b while the right side becomes greater 
than or equal to — b. The continuity in y, of the right side of (10) then guarantees 
the existence of a solution of (10) for y, such that —bSy, —y Sb, and hence 
from (11) such that |y,—%¥v.9|<snh. The Lipschitz condition then ensures the 
uniqueness of y, in a standard way. 

We shall henceforth consider only ¢; in J. Our next step is to show that 
VYn+1—Vn=0(1) as h-0. To this end we subtract (10) from the corresponding 
equation for y,,,;—%¥p9. The double sums involved are O(h) because our initial 
values satisfy (9). The other sums have some contributions from the roots of 
a(o) which are inside the unit circle. These contributions are altogetherO (h) 
since it is easily verified that they are bounded in absolute value by A multiplied 
by simple geometric series. The contributions from the roots of «(@) which are 

3° 
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on the unit circle must be dealt with more carefully, except for the contribution 
from the root g=1 which cancels out. A finite number of terms in these last 
contributions are O(h). Making use of this fact, and using an explicit represen- 
tation of ®;, we finally obtain a result which can be written in the following 


way 
q 
Ysa — Iu h SE Blo Sy ye7t+OM), (12) 


where 0; (‘=1, 2,...,q) are the g roots of «(g), other than unity, which lie on 
the unit circle, and where 


B(o) =B, 0" + B,-10°-* +--+ Bo. 


Let us now put 
Est (t;, yj) @ + =F ej E;_i) f(t, Vi), 
where 
i 
=2 07°. 
p=0 


Since argo;--0 we have immediately-that there exists E such that |E}|<£ for 
i=1,2,...,q, and all7. We then obtain 


n—k n—k+1 
Zt 9) er" pe E}_ if (t;- 1» Yj-1) SE af (tj, 9) 
= “2 i(f(tj-1 V;-1) — F(t; y;)) +0(1) 
n—k 


= 25; 1 (f(G-15 9-1) — f(t; ¥j-1)) + 


1 BEG ¥-9 — 14,¥)) O(N), 
as h-—>0. Putting 
Zale = | | B (0;)/«’ (a,) | == JD) 
we obtain 


n—k n—k 
[Vata —¥ 1S Dh2If(t-, Vj-1) —fit,, ¥;—1) | + Dh Xt, ¥;-1) —f(é, y;)| +0 (h) 
r- i= 
n—k 
= Da’ max |f(tj-1, Yj—-1) — f(t), ¥j-1)| + DAL 2 |v; — yj-1| + 0 (A) 
2/2" j=k 
<o(1)+DhL 1% — Yj-1|, AO, (13) 
jun 


since / is continuous and satisfies the Lipschitz condition. 
From this last result we obtain 


Veta — ¥n| S (0(4 (4) + |v: — vol) (4+ DAL)" 
S (0(1) +191 — vol) e?**. 


But y, — Yp=O(h) so we obtain finally that y,,,,—y,=0(1), as h->0. 
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The next step in the proof is to show that there exists a constant C which 
is independent of / and such that 


lp (t) —g(t’)|SC\t—¢t’|+ 0(A), (14) 


as h->0, and where ¢ and ?’ are in J, and g(t) is the polygon obtained by linear 
interpolation between the points (¢;, y;). We have 


p(t) — p(t’) = (PO) — Yn) + (Yn — Ym) + (Ym — Y(t’) - 


We choose ¢, and /,, so that |¢—t,|<A and |t’—t#,,|<h. The first and third 
terms on the right are then 0(1). We lose no generality in assuming that »>m. 
For the middle term we then add up all the equations of the form (5) which define 
Yn» Vn—1» +++» Ym- Using the first « — condition and the fact that y;— y;_,=0(1), 
we then obtain 


(Zi) (6 — Ym) +01) = AE (By Mlin yd +o + Bolten Ye)» — (5) 


so that 


Yn — Ym S01) +E —(n — m) hi 


from which the required result (14) follows. 

To complete the proof of the existence theorem we first note that it follows 
easily, from (14), that the functions y(¢) are equicontinuous for any sequence 
of h’s which tend to zero. By Ascoli’s theorem the 9,(¢) associated with some 
subsequence h; must therefore converge uniformly to a limit, y(¢) say. In (15) 
we now put h=h; n=({t/h;], m=O, and we use the second «—f condition to 
obtain 


n= Yo= My DHlby¥) +0 (0H) + 0), (16) 
so that, as h;>0, we have 
y(t) — yo= J t(u, y(u)) du. (17) 


This completes the proof of theorem 3. Of course the uniqueness of the 
solution of problem (1) implies more, namely that  (t)—> y(t) for all h-0. 

We conclude this section with a few remarks. 

The form of the above proof was partly determined by our intention to 
generalize in later sections. Otherwise we would have been able to avoid the 
use of equicontinuity arguments. Indeed, once (14) has been established we 
could modify only slightly the argument of LipscuITz (see [2]) involving maximum 
and minimum polygon lines, to prove existence and uniqueness at the same 
time. 

In the usual proof of theorem 3 in which Euler polygons are used, one has 
immediately that y,,,—y,=O(h). In fact much is made of a stronger result, 
namely that one has ¢-approximate solutions. That y,,,—y,=0(1) is best, 
in the circumstances we are considering, can be shown by the following argument. 

Suppose / (t, x) =/(t), where /(¢) is continuous in the interval [0, 1]. Consider 
the stable and consistent method defined by y,,,..—y,=2h/(t,), and, for simpli- 
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city, with y;=yy=%». Then, with h=1/n, we obtain in place of (12) exactly the 
following : 


Yaa = Yu= (= 1)"-* (Im) & (— 1°). 


We know that the sum in this last expression is 0(m) as m—> oo. To prove that 
this is best we asume that it is not, i.e. that the sum is in fact o(g(n)) where 
g(n)=o(n) and g(n) is positive and increasing with . We then introduce the 
linear function L,, defined by 


Lah = 2 (— 11). 





We have ||L,,|| =”. But L,(f)=0(g(n)), so there exists a finite M, independent 
of m and such that |L, (f)| <M), g(n). Hence by the principle of uniform bounded- 
ness it follows that ||L,,||/g(2) is bounded, and this contradicts the assumption 
that n/g(n)—> oo as n> co. Thus y,,,—y,=0(1) is the strongest result we can 
obtain in general. Restricting further the class of functions in problem (4) can 
of course lead to stronger results; for example, requiring /(¢, x) to satisfy a 
Lipschitz condition in ¢ as well as in x leads to y,,.,—y,—=O(h). Alternatively, 
allowing no roots other than unity to be on the unit circle would also lead to the 
same result. 


6. The Osgood theorem 


In 1898 Oscoop [6] proved an important generalization of the Cauchy- 
Lipschitz theorem. He introduced the requirement that 


If (é, %) — F(t, x2) | = ow (|x = Xl). 


Here w(u) is a non-decreasing continuous function defined for ~=0 and such 
that w(0)=0 while 2(u) > — oo as u->0, where 


Q(u) = f (A/a (t)) dt 
for some u%)> 0. 
He then established the following theorem: 


Theorem 4. If f(t, x) is continuous and satisfies the Osgood condition in a 
region containing the point (ty, Xo), then there exists a unique solution to problem 
(1) over a suitable interval containing ty. 

This theorem can be proven with the aid of multi-step methods just as we 
proved theorem 3, except for the following minor changes. We must first modify 
very slightly the definition of stability in order to preserve theorem 1, and we 
also need to modify our treatment of the fundamental inequalities that appear 
in theorems 1 and 3. 

The proof of theorem 1 proceeds as before except that in the inequality (8) 
we obtain w(le,,|) and w/(\e,|) in place of L|e,| and L |e,| on the right side of (8). 
We are now unable to deal with the first term on the right side so easily. However 
this first term is 

hB Bov(ey|) < hB Go (26) 
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which is O(h) as h->0. Our dominating difference equation therefore becomes 
E, =hA (w(E, 1) + (E, 2) +--+ @(E9)) + BC +O(h), 


where now A =f O(k + 1), B=max (« ®, ka OK, K), while C is as before. Using the 
lemma established below we then obtain 


|e,| SE, $24 (Q(BC +O(h) + hAcw(leo|)) + (» — 1) AA). 


It follows immediately that we must in general choose both 6 and h, depending 
on ¢ if we are to ensure that |e,|<«. Thus our definition of stability must be 
weakened slightly to allow 4, to depend on e. The root condition then again 
becomes necessary and sufficient for stability. 

The proof of theorem 4 then proceeds exactly as in theorem 3 except that 
the lemma is needed again to handle 


[Yat — Yn] SO(1) + Dh Deo(|y;— ¥j-1)), h->0O, 
j= 


which is the counterpart of the inequality (13). We would introduce the domi- 
nating difference equation as before and use the lemma to obtain 


IYn+a— Yn| SQ (Q(0(1) + Dhal|y, — yo|)) +n Dh). 


It follows that y,.,—¥,—=0(1) and the proof then continues as before. 
We have still to prove the following: 


Lemma. Suppose w(u) and 92(u) have the properties described above, and suppose 
n—1 


that a,—hF >) w(a;)+¢ where h, F,a;,c>0. Then 
j=0 
a, S24 (Q(c+hFw(a)) + (n —1) FA). 


<<. a 
— 


For the proof we note that 
Q(a;) — Q(a;_1) S (a; — a;-3) 2" (a;_1) 
= (a; = a;_,)/w (a;_1) 
=hF. 
Summing these inequalities for 7=2, 3, ..., m, we obtain 
92(a,) S Q(a,) + (n —1)Fh, 


and the required result follows. 


Theorem 4 can be established under conditions which are somewhat weaker 
than Osgood’s. It is probably true that any condition which guarantees uni- 
queness will be sufficient. 


7. The Peano theorem 


In 1890 PEANO [7] was able to establish the existence of a solution to problem 
(1) when f(t, x) is assumed only to be continuous. It is well-known that the 
solution need not be unique. 
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Before considering what multi-step methods might be used in the proof of 
PEANO’s theorem we shall point out that the situation is now quite different 
in two respects. 

First, although the roots of «(g) will still play a fundamental role, the concept 
of stability is no longer significant. This becomes clear when we show that even 
the simple Euler method is unstable, relative to the class of continuous functions 
and according to our definition of stability. We need only consider the problem 
defined by x' =x, x(0)=0. We use the formula y, ,,— Yn=hV yn. With y,=0 
we obtain y,, =0 for all m. But with y)=h it is easy to see by means of a diagram 
and for sufficiently small h, that y, lies between the two curves y=??/4 and 
y = (t+ 2h)?/4 so that the polygon converges to y=#?/4 as h->0. Thus the Euler 
method is unstable in this case, and its instability is clearly due to the non-uni- 
queness associated with the class of continuous functions. 

Second, the non-uniqueness also prevents us from proving that y, ,,— ,,=0 (1) 
as h->0. At least we are not able to prove this result for all multi-step methods 
which satisfy the root condition and the «—f conditions. We consider again 
the problem x’ = |x, x(0)=0. We use the formula y,.,—y,= 2nV Yn With 
Yo=0 and y,=h we see, as above, that y,=0 for m even, while y, >#/4 as h>0 
for m odd. Consequently y,, .;—¥v,=-0(1). 

In view of these two remarks we shall have to modify our approach in dealing 
with the following Peano theorem. 


Theorem 5. If /(t, x) is continuous in a region containing (ty, Xo) then there 
exists a solution of problem (1) over a suitable interval containing ty. 

We shall indicate briefly how this result can be established with the aid of 
any multi-step method which is consistent, and whose roots other than the 
simple root at unity lie zmside the unit circle. (In passing we note that the Euler 
method and all the Adams’ methods are of this type.) 


Our previous difficulties were due mostly to the existence of roots, other 
than unity, which were on the unit circle. Without them we have almost im- 
mediately that y,.,—y,—=O(h). The polygons are therefore equicontinuous. 
By summing equations as we did before to obtain (15) and then (16), we arrive 
again at (16) without the term 0(1). Equation (17) follows so that any uniformly 
convergent subsequence converges to a solution, and theorem § is established. 

It is worth pointing out that the notion of ¢-approximate solution is not 
needed. 


8. Concluding Remarks 


Except for the modifications introduced in the case of the Peano theorem 
our results can be interpreted as showing that stability and consistency of the 
multi-step methods are necessary and sufficient for the proof of existence theorems 
for ordinary differential equations. Although the phenomenon of instability 
arises in a different way for linear partial differential equations, the same terms 
are used, and analogous results have been obtained for such equations. See, 
for example, JoHN [3] and RicHTMYER [8]. 


In conclusion it should be pointed out that the ‘‘stable’’ methods considered 
in this paper may be unstable in another sence which is important in practice. 
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For example, the roots other than unity which lie om the unit circle can give 
rise to the so-called weak instability, which is especially important in integrations 
over a long range. Such phenomena have been treated by DAHLQuIsT [J], parti- 
cularly in his dissertation. 
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Moments and characteristic roots 
By 
F. L. BAUER and A. S. HOUSEHOLDER * 


1. For a given matrix A of order n, real or complex, and a matrix X of msn 
linearly independent columns, form a sequence of iterates 


(1) X,=A'X, 


the ‘‘“moment”’ matrices 


_ 


(2) M,; = X#X; 
(the superscript ‘‘H”’ signifies the transposed conjugate), and the matrix 


Myo---Mo, 
(3) Wutint ... «a i 
M,,...M, 


~ 


which is at least nonnegative semidefinite. In the most important special case, 
X will be a single vector x, and the moment matrices will be scalars w;;. In 
this case M will be assumed nonsingular, and hence positive definite. The purpose 
of this paper is to exhibit a general method of obtaining inclusion and separation 
theorems for the characteristic roots A;(A) (for convenience they will be called 
simply the roots of A), expressed in terms of the moments. The method makes 
use of matrix and vector norms. For this application only Euclidean vector 
norms and the spectral matrix norm will be used (see [9] for a general discussion 
of norms), but some of the theorems have analogues expressible in terms of 
other norms. 

By a separation theorem will be meant one which separates the complex 
plane into two or more com, mentary sets, no set necessarily connected, each 
set known to contain at least one characteristic root of A. Theorems of this 
form are, in general, valid only for normal matrices. The common boundary 
of two sets is normally included in both, but in special cases it may be excluded 
from both. An inclusion theorem is one which exhibits a set known to contain 
at least one root. This set will be called an inclusion set or domain, and usually 
there will be associated with it another inclusion set that contains its complement. 
For nonnormal matrices these sets will overlap and separation theorems do not 
hold. Both separation theorems and inclusion theorems are to be distinguished 
from exclusion theorems of the Gershgorin type, which exhibit sets known to 
contain no root. However, it has already been shown [8] that at least the classical 
theorems of this type can be deduced by the use of norms. 





* Oak Ridge National Laboratory, operated by Union Carbide Corporation for 
the U. S. Atomic Energy Commission. 





Moments and characteristic roots 43 


For matrices with nonsimple elementary divisors the problem becomes much 
more difficult, and such matrices will not be considered here. A matrix with 
only linear elementary divisors is at least similar to a normal matrix, hence, 
following WIELANDT [29] will be called normalizable (to avoid the cumbersome 
““diagonalizable’’, and the questionable ‘‘diagonable’’). However, it is also similar 
to a diagonal matrix, hence can be written in the form 


(4) A=PAP2, 


where A is diagonal. 

In case the roots of A are known a priori to lie on a particular one-dimensional 
locus, the inclusion sets become arcs of the locus. Moreover, the moments or 
moment matrices may depend upon a single index only, as for hermitian matrices 


(5) M;..;= ij? 
and for unitary ones 
(6) M;-_; = M;;. 


Inclusion theorems are of theoretical interest, but they have also a very 
important practical application in that they provide approximations to the roots 
along with rigorous error bounds. This is especially so in case the vector x is 
an approximation to a characteristic vector such as could be obtained by one 
of the standard iterative methods [7]. 

2. The general problem of obtaining inclusion and separation theorems in 
terms of moments was formulated by WIELANDT [27—30], who has obtained 
rather general results, some published without proof, and who makes use of 
only a single vector x. BUECKNER [3], also using only a single vector x, obtains 
essentially complete results in the hermitian case, expressed for integral equations. 
LEHMANN [15—17], also working with integral equations, proposes the use of 
more than one initial vector. Many partial results are to be found in the literature, 
and references will be found at the end of this paper. The simplest of all is the 
well known theorem that the Rayleigh quotient j4,;,,/u42; of a hermitian matrix 
lies on the closed segment between the least and the greatest of the roots. This 
theorem implies all of BUECKNER’s results; and a natural generaliza’ on to the 
case of normal matrices implies all those of WIELANDT that relate to normal 
matrices. It turns out that from the fundamental Theorem I can be derived 
both of these results, as well as those of WIELANDT that refer to nonnormal 
but normalizable matrices [29, 31]. These derivations will be made here in the 
interest of unification. 


3. For any vector y +0, it is a fundamental property of any norm that 


4 yl SHAT Il yl. 
Also 


(7) AI|S|Al]x(P),  x(P) =||P IPA. 
where the scalar x is called the spectral condition number of P. For any matrix P, 


“(P)214, 
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and equality holds only with P unitary. Neither the matrix P, nor the scalar x, 
is uniquely defined by A, since the columns of P can be normalized arbitrarily 
and different normalizations may lead to different values of x. However, all 
theorems involving x remain valid when x is understood as the greatest lower 
bound of all x(P). In particular, for normal matrices it will be understood that 
x= 1. 

Evidently ||A|| is equal to the maximal modulus of the diagonal elements. 
Hence the domain defined by 


{a: |4| > |]4 yIl/|ly 1} 


is an inclusion domain for A. Let 
(8) a(A) =a +a,A+---+a,2” 


be any polynomial of degree y or less (the case «,=0 is not excluded). Since 
(4) implies that 
a(A) = Pa(A) P+, 
it follows that 
{A: |a(A)| 2% I] (A) yl/ lly ID 


is also an inclusion domain. Moreover, if 
(9) B(A) =Bo+ BiA+--> +B? 
is also a polynomial of degree y or less, and 
y=B(A) x, 
then, when « is replaced by «/f, the result is that 
(10) {A: [a (A)/B(A)| = > |[a(A) ~||/|[B(A) ~]]} 


defines an inclusion domain for A. But since « and £ are arbitrary polynomials, 
each of degree v or less, they can be interchanged, and therefore 


(11) {A: |a(A)/B(A)| Sx ||a(A) x||/||B(A) *|]} 
also defines an inclusion domain. 


Theorem I. Let the matrix A be normalizable, and let «(A) and B(A) be any 
two polynomials. Then (10) and (11) define inclusion domains for A, where x is 
the spectral condition number of any matrix P that diagonalizes A, and where x 
is any nonnull vector. 

This is the fundamental theorem, and from it all later ones will be derived. 
With B=1, this is essentially equivalent to a theorem of FRANKLIN [6]. Let 


(12) a = (9,01, +++), ax (B,,f,,..-B). 


where a and 0 are vectors of the coefficients of the polynomials « and f. Then 


(13) |a(A) x||*=a"¥ Ma, ||B(A)x|| =o" Mob. 
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4. Two polynomials «(A) and £(A) will be called mutually orthogonal (with 
respect to A and x) in case a(A) x and B(A) x are orthogonal 


(14) (B(A) x)"a(A) x=0 
and it will be convenient to speak of ||«(A) x|| and ||#(A) x|| as being the norms 
of «(A) and B(A). In particular if 

|| (A) x|| =4, 
then it will be said that «(w) is normalized. 1f a symbolic operator is defined by 
(15) Sf 222% dw (A) = MgalHoo> 


these polynomials are orthogonal in the ordinary sense. If «(A) and (A) are 
orthonormal, and if 6 is any scalar, then the polynomials « — 68 and B+ 6a are 
also orthogonal and have the same norm [1+]|6|?]!. It follows that 


Theorem II. Jf «(A) and B(x) are orthonormal, and 6 is any scalar, then 


(16) {A: |[«(4) — 6B (A)]/[B(A) + a (a)]| S x} 
defines an inclusion domain for the normalizable matrix A. 
The proof follows directly from Theorem I, since if « and f in (11) are replaced 


by «—68 and 8+ 6a, then the right member becomes simply x. If B=1 and 
« is linear, it follows that 


a (A) = (A — Mor/Hoo)/ [e121 — Hor rolHoo]?- 
In this special case (16) has the simple form 


{A: |(a(A) — 6)/(1+ ba (A))| <x}. 


These regions represent circles, all projected from equal spherical caps on the 
Riemann sphere. In this linear case the theorem and interpretation are due to 
WIELANDT [29], [31] who derived the result by geometrical considerations. 


More generally, since (16) can be written 


fA: |[a(A)/B (A) — 8]/[4 + 3a(A)/B(A)]| <x, 


the regions, for varying 6, all represent projections of congruent regions on the 
Riemann sphere. Then if 6 is chosen to satisfy 


, x25d=1, 
the inclusion domains become 
{A: Re [xa (A)/B(A)] = (x4 — x)/2}. 
But 
xd = —expid 
for some #. Hence 


Corollary. Given the hypotheses of Theorem II, an inclusion domain for A is 
defined, for any 8, by 


(17) {a: Re[(expi8) a (A)/B(A)] < (% — #-9)/2}. 
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This corollary includes and generalizes the ordinary Rayleigh quotient 
theorem. In fact, if A is normal, the right member of (17) vanishes and the 
normalization is unimportant. Hence for any normal matrix, 


{A: Re[(exp7#) (A — Ho1/Moo)] S O} 


defines an inclusion domain. This states that any line through /9;/4o9 separates 
the complex plane into two half-planes, the line being common to both, such that 
each half-plane is an inclusion domain. For hermitian matrices it is sufficient 
to consider only the real axis. For unitary matrices two arcs of the unit circle 
are obtained. 

{f «(A) and B(A) are arbitrary polynomials, «(A) can be orthogonalized with 
respect to B(A) (or vice versa). This gives finally the inclusion domains 


m = (B(A) x)" a(A) x/||B(A) x|[? 
ye —llelddalP _ jy2 





(. a (2) — (m + 8) B(2) s+ hie 
| — ma) B (a) +302) | — 


|B (A) ~|/? 
and 
{a: Re [(exp i) (s+ m) < (x — >) yh. 


5. As might be expected, the theorems for normal matrices are especially 
simple. Indeed, since x = 1, the closed complement of any inclusion set is also 
an inclusion set, the common boundary locus being characterized by the 
equality sign in the inequalities. This follows from replacing 6 by — 6/|6|? or 
? by +2. Thus, for normal matrices and only for these, the inclusion theorems 
become two-set separation theorems. For separation theorems hereafter only 
the separation locus will be given. If A is normal, so are «(A) and 6(A), whether 
normalized or not. Moreover, if 


(18) 1(A, A) = Xoo t+ Xor4t+ moAt:: 


is any polynomial in A and A, then since A and A” are commutative it follows 
that 7(A”, A) is also normal. The corollary, as applied to normal matrices, can 
be restated as follows: 


Re [Aexpi#] = Re[(exp7#) y4 A y/y" y] 


defines, with any #, and any y +0, a separation locus for A. Let this be applied 
to the matrix 
[a*(A)]¥ 4(A%, A) a(A), 
and set 
y =a(A) x. 

This leads to 

Theorem III. Let y(A, A) and «(A) be polynomials (18) and (8), let A be normal, 
and x +0. Then for any angle #3, 


{a: Re[(expi) x (4, a)/|a(4)|*] = Re[(expid) x” y(A™, A) x/(a(A) x)# («(A) x)]} 
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defines a separation locus for A. This is to say that it defines a locus that separates 
the complex plane into two sets, with the locus common to both, each of which is an 
inclusion set. 


6. Since the denominators in the inequality are real and positive, there is 
no loss in generality in supposing «(A)=1, and also none in supposing the right 
member to be zero, since it is necessary only to clear of fractions and transfer 
all terms to the left to bring this about. Hence the statement just given is no 
more general than the following: If (A, 4), as defined by (18), satisfies 


(19) x4 y(A4, A)x=O0= f x(A, A) dw(A), 
then for any angle # the locus 
(20) {A: Re[(expi#) x (A, 4)] = 0} 


separates the complex plane into two inclusion sets, the locus being contained 
in each. Hence any polynomial y satisfying (19) will be called a separation 
polynomial. Evidently any linear combination of separation polynomials is again 
a separation polynomial. 

By application of standard orthogonalizing methods it is possible to develop 
a sequence of mutually orthogonal polynomials 


Po(A)=1, (A) =A — MoilMoo, 
Hoo Foi --+ Ho» 
(21) g(a)=det} = det Ma, 


Hy-1,0 My-1,1 ih My-1,» 


1 A a” 


each with leading coefficient unity, and each of these, except for gp, is also a 
separation polynomial. The corresponding vectors are 


(22) pb, = 9, (A) x, 
with 
(23) |. 1/2? = 4. — me! M-*m,, 
where 
M,_; m, 
= (nt aa 


For nonnormal matrices the normalized polynomials and vectors may be more 
convenient, 


(24) Yo (A) = Po(A)/|| Pp || » | y, (A) = Pol |lPol - 


The vectors g, or p,, and hence the polynomials y, or y,, can be computed re- 
cursively [2]. In this way it is possible to avoid the progressive loss of leading 
figures that would result from the direct use of the higher moments [19]. Hence 


(25) Jf ) Yol(A) der(A) = 5,5. 
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Define the kernel polynomials 
Hoo Hoi-+-Hoa 1 


Mio Mi1-++tae B 
om %q(A,)=—Moodet{ . . . . [det M, 
Moo Mo1-++Hoo Th 
. Lew ® 
= flog (1, A,...,4°) Mo" (1,2, ..., ue)”, 


where the latter form is obtained from a standard determinantal identity. Note 
that 





Xo (A, Lt) = Hy (uM, A) . 
It can be verified directly that 


Jx,(A,u) Aedw(A)=p?, ea, 


and hence that x,(A,“) is an identity operator for polynomials of degree o or 
less. This is to say that if «(A) is of degree not greater than o, then 


(27) J %,(A, u) (A) dw (A) = a(n). 


In particular, if y(A) is of degree less than o, then 


J (A, mu) y (A) (A — w) dw(a) = 0; 


moreover, 
(28) fxn) %(A,u!) do(d)=%(un'), eS. 
From these identities, and from 
fu,(A, A) dw(a4) =o +1, 


it follows that the following are separation polynomials: 


%_(A, A) — (0 +1), 
H(A, 14) ~~ 1, 
(A, 1) y (A) (A — pt) , 





Other separation polynomials are 





P(A) Po(A), OG, 


(A) %o(A), 


for y a polynomial of degree less than o. 
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7. If x(A, 4) is a separation polynomial, and 
4 (A", A) x +0, 


then «x is not a characteristic vector of y(A”, A). Hence 0 is not a characteristic 
root of ¥(A”, A) and if not interior to its field of values, can at worst lie on a 
side. Therefore, if t=exp7#, then for every # with at most a single exception, 
the locus 


{A:Re[t x (A, a)] = 0} 


can itself be excluded from the inclusion domains. This is to say that except 
for possibly a single value of #, an inclusion domain is defined by 


Re[t (A, A)] <0. 


For hermitian matrices when (A, 4)=£(A), and for any matrix when =f, 
it is not possible for #=0 to be an exceptional angle. 


Now consider any of the orthogonal polynomials gy, (A), and let it be factored 
in any way in the form 


(29) Po (A) _ Yo,1 (A) Vo,2 (A) ’ 


where y,, and y, are both polynomials and neither one a constant. Then 


X(A, A) = Yo, 2(A) Ga (A) =| ¥o,2(A)|? ¥0,1 (A) 
is a separation polynomial, and if y,(A)x=0, then also ¥(A”, A)x=+0. Hence 


Theorem IV. If the orthogonal polynomial ,(A) is factored as in (29) into 
two polynomial factors, neither one a constant, and t=exp1%, then for any angle 
® other than possibly a single exceptional one, the locus 


{a:Re[ry,,1(A)] =0} 


separates the complex plane into two open inclusion domains. 


In particular, therefore, if y, ,(A) is linear, this shows that all zeros of the 
orthogonal polynomials gy, are contained in the field of values. In the hermitian 
case, by taking appropriate linear and quadratic factors the result is BUECKNER’S 
theorem that if x) and x, are linearly independent, then the zeros of gy, (A) separate 
the real axis into o+1 open sets (intervals and half-lines), each of which is an 
inclusion set. 


8. Consider next the separation polynomial 
%q(A, 4) y(A) (A — 4p), 
where y(A) is of degree a —1 or less. Let A\’) be any zero of x,(A, 4), and choose 
7 (A) = %5(A, u)/(A — ay”). 
Then except for a nonnegative real factor, the above separation polynomial is 


(A — Al)) (A —y). 


Numer. Math. Bd. 1 4 








50 F. L. Bauer and A. S. HOUSEHOLDER: 


Hence the circle through u and 4‘) = 49) (u), 
{A: Re [x (A — A@) (A —u)] = 0} 


separates the complex plane into two inclusion sets for any t=exp7#, and the 
separation is strict for every # with the possible exception of a single 0, provided 


z(A¥, ft) y(A) (A — wl) x +0. 
In the hermitian case this theorem has been announced by WIELANDT. Again 


y (A) could be any other divisor of x, (A, u). 


For o=1 there is a single circle through uw and 


At) = w (MH) = (411 — Hor )/ (410 — Hoo) - 


The fraction on the right will be called the Temple quotient, and the set of all 
possible values, for fixed ~ and varying x, will be called the Temple field of 
values for uw [20]. Any circle through w and w() is a separation curve associated 
with the particular x. 


If uJ —A is nonsingular, then 





ma -1 Eoo— tro als y# (uI—A)"*y 
[mu — w(x) | BE Moo Hy oF Mort M11 adh 
y = (u I — A) x. 


Consequently the values of [4 — w (u) ]“lie in the convex hull of the roots (u — A,)+ 
of (uJ — A). The field of values itself for ~ is bounded by arcs which go through 
pb and two roots of A. It is, in fact, the intersection of all those closed circular 
domains, each determined by wu and a pair of roots, but containing all other roots. 


The separation circles that arise in the quadratic case can be written, for any 
scalar y, 


Mol ‘- y|? = Moola1 — Hoofor + | ox ons Y Hool?- 


This can be written also in the form 


Hoo| ae y|? - [| eas — Y Mrol” + YY (Moo a1 — Mor l0) J/Ha1- 


From the first form it appears that the circle has minimal absolute radius when 
Y=Hoil@oo, that is when the center is at the Rayleigh quotient. This, for A 
hermitian, is the Weinstein theorem [26]. From the second form it appears that 
the circle has minimal relative radius when y= 44/1449. 


It perhaps goes without saying that if the vector iterates are numbered in 
reverse order they can be regarded as obtained by iterating with A+, and inclusion 
intervals can be obtained for the Aj’. Nothing new results except that the criteria 
for minimization are changed. Thus the minimal inclusion circle for the reciprocals 
has its center at 49/41. 
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9. To return to the general case, the notion of separation polynomial no longer 
applies, but the polynomials g,(A) do have a property of interest. 


Theorem V. Among all polynomials «(A) of degree v and leading coefficient 
unity, the polynomial y,(A) has minimal norm, hence minimizes the right member 


of (41) for given B(A). 
The proof is made by observing that for any «(A), 


|x (A) «||? =p,, + mi a+a"m, +a" M,_,a 
= ly, — my’ My m, + (m, + M,_,a)" M,—4(m, + M,_@). 
Since M,_, is positive definite, this quantity is minimized by 
a=—M-\im,, 


which defines y,, the minimum value being || p,(A) x||/||8(A) ||. For y=1 and 
A hermitian, again the Weinstein theorem appears and the inclusion domain is 
truly minimized. For v>1, let 


v 


a(a) = [] (A —a,) 


1 


and assume £(A)=1. Then the union of the » circles 
1 
|A — a] S|]a(A) x|[* 


is a (weaker) inclusion domain, and obviously the radii of these circles are mini- 
mized. In this sense, the theorem gives an optimal inclusion domain. 


So far the moments of only a single vector x have been utilized. However, 
since the vector x occurs only on the right of (11), it is natural to allow x to vary 
in some subspace and to seek a minimum. Naturally if x were allowed to vary 
over the entire space the minimum would occur when ~ is a characteristic vector 
of A, and it is to be expected that when a subspace is introduced a characteristic 
value problem of lower order would arise. 


Hence let 
(30) x=Xw. 


Then the minimum coefficient of x in (11), and the maximum coefficient of x7} 
in (10), are equal to the least and greatest positive 4 satisfying 


det [(B (A) X)" (B(A) X) — a*(a(A(X)* (a(A) X)] =. 
To transform the problem slightly, the characteristic value problem to be solved is 
(31)  (Bol,...,B,1) M (Bol, ...,B, 1)¥ w = a2 (aig I, ...,0,1) M(Gol, ...,%,1)% w. 


If the columns of X are made up of a certain number of iterates Ka, & Sg, o<- 
then the vector Xw is of the form 


y (A) Xp. 
4* 
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Hence the problem reduces to that of finding a y(A) of specified degree o that, 
for fixed « and f, minimizes 


||= (A) (A) xoll/I1B (A) 7 (A) %oll - 


In this case the matrix of the submatrices M;,; has identical rows and columns. 
If these are removed, the matrix collapses, and the problem has the form 


(32) RIMRc=2#SYM Sc, 


where c is the vector of coefficients of y(A), and 


m8 ©... By 0 0... 
R= ay Xo 10) eee Sa B, Bo (8) eee 
Ag Hy Xp --- Bs By Bo --- 


are matrices of a+ 1 columns. The matrix M is here the collapsed, nonsingular 
matrix. Minimization in a subspace was considered first by LEHMANN [16] in 
connection with the Temple quotient. 


The theorems of CoLLatz [4], WALKER and WESTON [23], and Bartscu [J], 
are weaker formulations of Theorem I with linear « and constant 8, as WIELANDT 
has shown. However, they lead to simpler computations. 


The authors make grateful acknowledgement to Professor WIELANDT for 
making his notes available to them in advance of their publication [37]. 
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54 Berichtigung: G. BERTRAM 


Berichtigung 
zu meiner Arbeit ,,Verscharfung einer Fehlerabschatzung ...‘‘, Numer. Math. 1, 
135—141 (1959). 
Die Zahlen des Beispiels auf S. 141 miissen heiBen: 
2. Zeile: |y—y,4| S0,1036 
6.—11. Zeile: 
0,0521 fir #=0 und #*=1 0,0533 fiir *=0,3 und #=0,7 
lyv—¥,| $4 0,0527 fir x=0,1 und *#=0,9 | 0,0535 fiir *=0,4 und +=0,6 
0,0530 fiir *=0,2 und *=0,8 | 0,0535 fiir *=0,5 
Diese Abschatzung ist also ungefahr um den Faktor 2 (nicht 6) besser. Der strenge 
Fehler ist bei x =0,075 etwa 0,0030. 


Ich danke Herrn B6érscu-Supan fiir den Hinweis auf diese das Ergebnis ent- 
stellenden Fehler. 


G. BERTRAM 
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Zweck des Taschenbuches ist die Zusammenstellung erprobter Algorithmen fiir Rechnungen 
verschiedenster Art: Lésung endlicher Gleichungen und Funktionalgleichungen, numerische 
Approximationen, Berechnung spezieller Funktionen usw. Die Algorithmen werden in der 
Symbolsprache ALGOL abgefaBt und sind daher mit geeignetem Formeliibersetzer fiir alle 
Rechenmaschinen verwendbar. Auch ohne Formeliibersetzer kénnen sie als Programmie- 
rungsmodelle benutzt werden. Der Text der Bande ist englisch. 


Zur Zeit sind fiir die einzelnen Bande folgende Themen vorgesehen: Band 1a beschreibt 
die Symbolsprache ALGOL und ihren Gebrauch, Band 1b die Struktur der Formeliibersetzer. 
Diese einfiihrenden Bande enthalten damit als einzige nicht vorwiegend Algorithmen. Band 2 
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Algorithmen veréffentlicht. 


Vor Erscheinen der einzelnen Bande werden die Algorithmen jeweils in Teilbeitragen inner- 
halb der Zeitschrift ,,Numerische Mathematik“ verédffentlicht, um sie médglichst bald der 
Allgemeinheit zuganglich zu machen und um vor der endgiiltigen Fassung des Bandes eventuell 
noch zusatzliche Informationen und Berichtigungen von Lesern zu erhalten. 


Beitrage zum Taschenbuch werden freundlichst erbeten. Allerdings kénnen zunachst nur 
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